AstrAI/astrai
ViperEkura 4145d35e3c refactor: 检查点加载重构,路径替代对象传递
- model: nn.Module -> model_fn 工厂函数,spawn 边界只传字符串
- Trainer.train(resume_dir=path) — Checkpoint 不再通过 pickle 传递
- TrainContextBuilder.with_resume_dir(path) — 自动检测 meta.json 分流 resume/from-scratch
- CheckpointCallback: 拆分 state_dict 收集(全 rank)与磁盘写入(rank-0),修复 FSDP 死锁
- serialization: load_torch 支持 broadcast,消除 _load_extra/_load_torch_broadcast
- optimizer/scheduler 恢复逻辑内联到 build(),在 executor.prepare() 之后执行
- pyproject.toml: ruff exclude build/ 避免 CI 扫描构建产物
2026-05-27 20:15:29 +08:00
..
config refactor: 检查点加载重构,路径替代对象传递 2026-05-27 20:15:29 +08:00
dataset fix: 断点续训恢复优化器/调度器状态及采样器剩余长度 2026-05-26 13:50:25 +08:00
inference fix: ProgressBar默认输出到stdout 2026-05-26 13:27:05 +08:00
model refactor: 简化 _disable_random_init,scheduler 移入同步块 2026-05-26 17:05:25 +08:00
parallel feat: 新增FSDP并行后端 2026-05-25 19:43:14 +08:00
tokenize fix: 移除多余 request 参数并增强 tokenizer 健壮性 2026-05-17 12:52:18 +08:00
trainer refactor: 检查点加载重构,路径替代对象传递 2026-05-27 20:15:29 +08:00
__init__.py fix: 修复 to_dict list 类型丢失与 OpenAI stop 参数失效 2026-05-19 21:07:07 +08:00
factory.py refactor: 工厂 kwargs 过滤及组件参数清理 2026-05-16 16:47:41 +08:00
protocols.py refactor: 重构训练后端为 Executor 模式 2026-05-24 20:35:44 +08:00
serialization.py refactor: 检查点加载重构,路径替代对象传递 2026-05-27 20:15:29 +08:00