- gradient_as_bucket_view=True 零拷贝梯度归并 - static_graph=True 跳过每轮 bucket 重建 - broadcast_buffers=False 省 buffer 广播 - AdamW fused=True 融合优化器 kernel |
||
|---|---|---|
| .. | ||
| benchmark.py | ||
| generate.py | ||
| perplexity.py | ||
| server.py | ||
| train.py | ||
- gradient_as_bucket_view=True 零拷贝梯度归并 - static_graph=True 跳过每轮 bucket 重建 - broadcast_buffers=False 省 buffer 广播 - AdamW fused=True 融合优化器 kernel |
||
|---|---|---|
| .. | ||
| benchmark.py | ||
| generate.py | ||
| perplexity.py | ||
| server.py | ||
| train.py | ||