- setup_parallel 接收 local_rank 参数,不再读环境变量推导 - TorchrunStrategy 从 env 读取 LOCAL_RANK,LocalStrategy 用 rank - _detect_launcher() 分级检测替代内联 RANK 检查 - _run_single_rank 统一入口,消除 _run_single/_run_multi 重复 - 优雅退出:except BaseException 终止子进程并 re-join - gradient_checkpointing_modules 判定提取到外部变量 |
||
|---|---|---|
| .. | ||
| benchmark.py | ||
| evaluate_mmlu.py | ||
| generate.py | ||
| perplexity.py | ||
| preprocess.py | ||
| server.py | ||
| train.py | ||