ViperEkura
  • Joined on 2026-04-02
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-17 20:30:52 +08:00
d0e3464663 docs: 修正文档中类名/字段名与代码不一致之处
2c2697390d feat: 新增 GradientCheckpointingCallback
7621f05d3f docs: AdamW beta 默认值改为 (0.9, 0.95)
Compare 3 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-17 16:45:35 +08:00
10ebd7211f feat: 新增 Muon 优化器
42a391f0fb feat: 训练中新增验证循环
97c7ac0f4f refactor: Transformer更名为AutoRegressiveLM并新增EmbeddingEncoder
8f1b32f2b6 fix: 移除多余 request 参数并增强 tokenizer 健壮性
c241a5dcef refactor: 优化并行训练配置与启动管理
Compare 5 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-17 12:04:04 +08:00
44dab27fdc feat: 数据集加载时校验必填字段
a44fd22a99 fix: 修复训练与模型参数传递问题
8a11a7d444 fix: 修复训练脚本两处参数传递问题
1d54491809 refactor: 改用递归子模块 init 替代统一 normal_(0.006)
Compare 4 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-17 10:31:56 +08:00
ad9f4d9cf6 refactor: generate_ar 改用流式输出并去除冗余注释
e1638a7ade fix: 修正AdamW超参数默认值与文档示例
f91bfee33e refactor: Config序列化统一BaseConfig基类
d7a7f570ed refactor: 训练循环改为两重迭代并统一参数命名
7dea929788 refactor: checkpoint 按 HF 方式存独立 .pt 文件,callback 接管恢复
Compare 10 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-15 23:39:25 +08:00
3d12a03909 docs : 拆分文档并补充类图缺失类和关系线
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-15 22:56:13 +08:00
c169659611 docs: 修正 assets/docs/ 类图、数据流、参数文档及贡献指南
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-15 21:37:58 +08:00
e12f1a7ee5 feat: BaseModelConfig + DeepSeekMoE + 工厂模式替代 if/else
ef25efffa2 refactor: 拆分 module.py 为 components 子包
19532440b4 chore: 版本号升至 1.3.5
Compare 3 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-15 18:11:08 +08:00
9096e413c3 refactor: RotaryEmbedding 合并 cos/sin 为单一复数缓存
9d5e9fa6c4 perf: DDP 加 gradient_as_bucket_view/static_graph/broadcast_buffers,AdamW fused
08dde46778 fix: 修复训练循环 step/backward 顺序,重构为三重循环嵌套
Compare 3 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-14 21:38:17 +08:00
513f1f7826 perf: waiting_queue 改用 deque,pull_candidates 从 O(n²) 降到 O(1)
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-14 21:31:43 +08:00
e3382f6bb5 fix: 修复推理引擎 batch decode 中多项正确性与并发问题
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-14 21:27:36 +08:00
29b5717a38 fix: 修复推理引擎 batch decode 中多项正确性与并发问题
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-14 21:00:44 +08:00
f0339022c1 fix: batch 推理示例添加 chat template 和 system prompt
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-14 20:32:13 +08:00
d8da2cf17c docs: 修复文档中与源码不符的类名、方法签名和模块归属
205b40bd28 refactor: 重构 cache 和 inference 参数体系,分离存储与分配
18fe6e9339 refactor: 消除多处重复模式,统一工厂和参数传递
2196c34c52 refactor: 重构 inference 模块架构,引入设计模式并分组文件
466c2e1efd fix: process_attention_mask 中 expand 后的 inplace 写导致 alias 报错
Compare 18 commits »
ViperEkura pushed to main at ViperEkura/postgraduate-prep 2026-05-14 15:40:39 +08:00
89a3b16ef5 feat: 积分-万能代换增加判别式分析(a²-b²决定arctan/ln形式)
2fdf8e35ab feat: 增加错题 1/(1+√((1+x)/x)) 积分(有理化分母+配平方)
f3dfb5273d feat: 增加错题 1/(a+b cosx) 积分(万能代换+判别式分情况)
900e6c3e62 refactor: 规划对齐张宇+王道课程,按子专题细化,目标6月底完成第一轮
Compare 4 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-11 17:52:52 +08:00
38e18fdfd3 refactor: PagedCache Facade 模式,提取 PagePool/PrefixCache/TaskTable
4753958f92 refactor: 页状态移入 PagedCache,Task 纯化为域对象
73d6cc0f26 refactor: TaskManager 剥离页管理,STOP 移至 task.py
317ed90bac refactor: 拆分 scheduler 为 TaskManager + Executor
Compare 4 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-10 21:07:53 +08:00
951df8155c perf: gather 向量化
a58fab8d6e fix: max_seq_len 检查改为仅 prompt 超限发 STOP,max_tokens 超出部分 clamp
a3c8296135 fix: page cache 分配失败越界崩溃 + 长度超限终止
c95ace41aa fix: prefill 时 attention mask 长度不足导致 expand 崩溃
Compare 4 commits »
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-10 18:25:42 +08:00
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-10 18:23:33 +08:00
a3bde30fb1 feat: 服务化基础设施 - 有界队列/超时/优雅关闭/metrics
3da428e0e4 perf: PagedCache 持久前缀缓存 + LRU 逐出
133a9de98f feat: _generate_streaming 支持 batch 模式
Compare 3 commits »
ViperEkura pushed to main at ViperEkura/postgraduate-prep 2026-05-10 17:31:11 +08:00
6335d21418 feat: 增加杂项随手记(个人发现与零散规律总结)
ViperEkura pushed to main at ViperEkura/AstrAI 2026-05-10 17:25:42 +08:00
523eacf5fe release: v1.3.4
cffedaad5e perf: 消除非流式推理 CPU 空转并减少 decode GPU 张量冗余分配
Compare 2 commits »