- 训练循环从三重(epoch→batched→batch)改为二重(epoch→batch) - batch_size → batch_per_device, accumulation_steps → grad_accum_steps - scheduler 移入 step block 对齐 optimizer 更新步 - GradientClippingCallback 改用 on_step_begin 避免零梯度裁剪 - 移除 _train_impl 误导性的 -> Checkpoint 标注 - total_steps 修除为向下取整并精简为一行 - warmup_steps 改为 warmup_ratio (默认0.05) |
||
|---|---|---|
| .. | ||
| demo | ||
| tools | ||
| docker.sh | ||
| pre_commit.sh | ||