- 使用Checkpoint.load()替代手动加载model.safetensors,恢复optimizer/scheduler状态 - TrainContextBuilder从checkpoint.extra恢复优化器和调度器state_dict - ResumableDistributedSampler.__len__返回剩余样本数而非总数 - 训练前对state_dict置空避免mp.spawn pickle 7GB大对象 |
||
|---|---|---|
| .. | ||
| demo | ||
| tools | ||
| docker.sh | ||
| pre_commit.sh | ||