- total_steps 除以 accumulation_steps,匹配 optimizer.step() 频率 - warmup_steps 用 min 截断,避免 lr_decay_steps 为负 |
||
|---|---|---|
| .. | ||
| benchmark.py | ||
| generate.py | ||
| perplexity.py | ||
| server.py | ||
| train.py | ||
- total_steps 除以 accumulation_steps,匹配 optimizer.step() 频率 - warmup_steps 用 min 截断,避免 lr_decay_steps 为负 |
||
|---|---|---|
| .. | ||
| benchmark.py | ||
| generate.py | ||
| perplexity.py | ||
| server.py | ||
| train.py | ||