AstrAI

Commit Graph

Author	SHA1	Message	Date
ViperEkura	0ba8c70ce1	fix: 修复 MLA 多个 bug 并缩小测试模型参数 - MLA kv_b_proj 输出维度和 q_rope 切分偏移修复 - 打通 MLA 配置从 ModelConfig 到 DecoderBlock 的传递路径 - rope_theta 配置不再被忽略，MLA 使用 qk_rope_head_dim - tie_weight 使用 is True 避免 None 隐式生效 - norm_eps/rope base 类型标注修正 - 测试模型参数缩小 (dim=8, head_dim=4) - 新增 6 种架构配置 × 2 场景的前向传播测试	2026-05-16 14:57:43 +08:00
ViperEkura	e12f1a7ee5	feat: BaseModelConfig + DeepSeekMoE + 工厂模式替代 if/else - BaseModelConfig: fields() 精确字段匹配 + 类型矫正 + 未知key警告 - DeepSeekMoE: 共享专家 + 路由专家 + top-K 门控 - AttnFactory/FFNFactory: 装饰器注册，DecoderBlock 零分支 - config 用 attn_type/ffn_type 驱动组件选择	2026-05-15 20:34:52 +08:00
ViperEkura	ef25efffa2	refactor: 拆分 module.py 为 components 子包 - rope/linear/norm/embedding/mlp/attention/decoder_block 各自独立文件 - 依赖单向无循环 - 公开接口不变，外部无需修改	2026-05-15 20:08:36 +08:00
ViperEkura	9096e413c3	refactor: RotaryEmbedding 合并 cos/sin 为单一复数缓存 - get_rotary_emb() 返回复数张量替代 Tuple[cos, sin] - RotaryEmbedding 存储单一 freqs_cis buffer 替代分离的 cos_cached/sin_cached - forward 中 view_as_complex 重建复数	2026-05-15 18:03:59 +08:00
ViperEkura	205b40bd28	refactor: 重构 cache 和 inference 参数体系，分离存储与分配 - 合并 GenerationRequest/GenerationParams，统一 max_tokens 参数名 - PagePool/PrefixCache 分离为 Allocator + PrefixCache + PagePool - 拆分 KV 存储为独立 Storage 类，PagedCache → KVCache，CacheView → KvcacheView - Allocator.inc_ref 移除 LRU 防止竞争，Storage.write 增加负页防御 - Allocator/PrefixCache/TaskTable 加 threading.Lock 保证线程安全 - server.py uvicorn.run 改为传 app 对象修复导入错误 - benchmark.py 适配 KVCache 新 API	2026-05-14 20:05:08 +08:00
ViperEkura	18fe6e9339	refactor: 消除多处重复模式，统一工厂和参数传递 - AutoModel 继承 BaseFactory，消除自建 Registry（-30 行） - executor.execute_prefill 删除重复 forward 代码块（bug） - train_callback 移除 Protocol 上矛盾的 issubclass 检查 - engine.py 内部方法统一传 GenerationParams，校验内聚 - protocol.py SSEBuilder 类→函数，handle() 用 GenerationParams - StreamContext 动态属性改为显式 dataclass 字段 - BaseFactory 新增 get_component_class 方法	2026-05-14 18:00:50 +08:00
ViperEkura	2196c34c52	refactor: 重构 inference 模块架构，引入设计模式并分组文件 - 新增 protocol.py 协议层，Template Method 模式消除流/非流分支 45% 重复 - SSEBuilder 统一 SSE 构造，StopChecker 独立 stop_sequence 检测 - AnthropicHandler 追踪已产出文本，修复 stop 时重复 delta - server.py 路由从约 100 行缩减至 3 行 - 拆分为 core/（cache/executor/scheduler/task）和 api/（protocol/server） - 外部保持二级导入路径（from astrai.inference import Name） - 删除所有分隔线注释，代码按语义自然分组	2026-05-14 17:42:37 +08:00
ViperEkura	466c2e1efd	fix: process_attention_mask 中 expand 后的 inplace 写导致 alias 报错 - pad.view.expand 产生的视图多元素指向同一内存，attend &= 写入报错 - 改为 .expand().clone() 独立内存后再 inplace	2026-05-14 16:30:31 +08:00
ViperEkura	7e26d848ab	perf: apply_rotary_emb 改用复数乘法 - get_rotary_emb 保留 cos/sin 实数存储，forward 组合为 complex - apply_rotary_emb 用 view_as_complex 复数乘法替代多次 view mul stack - 移除 GQA MLA DecoderBlock 中的 Tuple Tensor Tensor 类型 - 解码从 4.24s 降到 3.49s	2026-05-14 16:20:16 +08:00
ViperEkura	ed95ef245c	perf: 消除 RotaryEmbedding.forward 中 position_ids GPU 同步 - cos/sin 缓存预分配到 max_len，移除运行时动态扩容逻辑 - 移除未使用的 max_len_cached 属性 - 解码累计从 4.23s → 3.99s（+5.7%）	2026-05-14 15:53:21 +08:00
ViperEkura	6d6ef99e66	perf: 消除 PagedCache.write 中的 position_ids GPU 同步，解码提速 15% - CacheView.write 用 total_len - k.size(1) 推导 start_pos，替代 position_ids[0,0].item() - 移除 GQA/MLA/DecoderBlock 中不再使用的 position_ids 参数 - PagedCache.write 参数 position_ids:Tensor → start_pos:int	2026-05-14 15:37:48 +08:00
ViperEkura	c0effc9f5b	refactor: 位置编码改用 position_ids [B,S]，简化 attention mask 构建 - RotaryEmbedding/CacheView 接受 position_ids 替代 start_pos - process_attention_mask 用 position_ids >= arange 做逐位置 causal - 训练/无 KV cache 时 position_ids=None 内部自动处理 - 移除 executor/benchmark 中冗余的 input_mask 构造	2026-05-14 13:26:31 +08:00
ViperEkura	a3c8296135	fix: page cache 分配失败越界崩溃 + 长度超限终止 - astrai/inference/scheduler.py: add_task 增加 max_seq_len 检查，超限时直接发 STOP 信号终止 - astrai/inference/scheduler.py: _maybe_alloc_page 返回 bool，alloc 失败时标记 ABORTED + 发 STOP - astrai/inference/scheduler.py: _execute_decode 过滤分配失败任务，避免 page_table 越界 - astrai/inference/scheduler.py: _remove_finished_tasks 清理 ABORTED 任务并释放 pages - astrai/inference/scheduler.py: _execute_prefill input_mask 改为覆盖全部 prompt_len - astrai/model/transformer.py: seq_mask is None 分支补全 start_pos + seq_len 列	2026-05-10 20:14:38 +08:00
ViperEkura	c95ace41aa	fix: prefill 时 attention mask 长度不足导致 expand 崩溃 - astrai/inference/scheduler.py: prefill input_mask 由 [batch, seq_len] 改为 [batch, prompt_len]，覆盖全部 KV 位置 - astrai/model/transformer.py: seq_mask is None 分支补全 start_pos + seq_len 列，避免 expand 非 singleton 维度不匹配	2026-05-10 19:56:41 +08:00
ViperEkura	283bcaf2ff	fix: 修复 CLI 参数缺失/重复、device_ids 越界、generate 参数名不一致、scheduler 时序、非流式截断等 bug - train.py: 补上 --batch_size、--grpo_clip_eps，删除 3 处重复 --group_size - generate.py: --model_dir 改为 --param_path 对齐 README - automodel.py: from_pretrained 新增 strict 参数（默认 True） - parallel/setup.py: 修复 device_ids 索引越界 - train_callback.py: scheduler.step() 移至 on_step_end - test_train_strategy.py: 测试中补 optimizer.step() - engine.py: 非流式改为循环等待所有任务完成，补 remove_task 清理 - scheduler.py: Task 添加 _pages_freed 标志，杜绝双重释放 - trainer.py: accumulation_steps=0 时 clamp 为 1 - tokenizer.py: save_pretrained 添加 _tokenizer is None 检查 - benchmark.py: 修复 ModelConfig 过时 import 路径 - inference/__init__.py: 修复 stale docstring	2026-05-09 14:36:42 +08:00
ViperEkura	30cc2d67a4	refactor: 分页 KV cache 替换固定 slot，删除 PrefixCache 及相关死代码 - 用 PagedCache + CacheView 替换固定 slot 式 KV cache，attention 层只通过 page_table 间接索引 - 删除 PrefixCache（radix tree）及 scheduler 中所有 prefix cache 命中/插入/释放逻辑 - 删除无用函数：pin、version、free_count、_mark_seq_mask 及 seq_mask 分配 - 修复 write 在多页 prefill 时 offset 为负导致 chunk 计算错误 - _make_page_table_tensor 改用 list 拼接一次 tensor，去掉逐元素赋值 - 清理 model 接口参数：kv_cache, slot_indices → paged_cache（CacheView） - 精简 docstring 为单行，删除冗余 section 注释和旧代码 - 修复 test_scheduler_concurrency.py 缺少 import pytest	2026-05-08 20:44:05 +08:00
ViperEkura	a6f5ff3b37	fix: 修复 remove_task 未释放 KV cache slot 导致第二轮对话死锁 - remove_task() 现在释放 KV cache slot 和 prefix cache 引用 - _refill_active_batch 中 alloc 失败时将剩余 task 推回 waiting_queue - 主循环增加 try/except 异常兜底，发送 _STOP 给所有 task - 重构：server.py 全局变量改为 ServerState 类；automodel.py 使用 Registry 替代裸 dict；合并 TrainContextBuilder 的 with_* 方法到 build()	2026-05-08 14:53:04 +08:00
ViperEkura	b89f8436ea	refactor: 将KV缓存槽位映射下沉到模型注意力层，移除_remap_kv和_writeback_kv	2026-05-06 20:01:22 +08:00
ViperEkura	b0eff02446	chore: 修改RMSNorm 实现	2026-04-06 20:27:01 +08:00
ViperEkura	64b78ecce3	fix: 增加旋转位置编码扩展	2026-04-06 13:29:39 +08:00
ViperEkura	f2ffdf60d0	chore: 修改错误拼写	2026-04-06 10:37:19 +08:00
ViperEkura	3fee87897d	chore: 修改拼写错误问题	2026-04-06 09:28:16 +08:00
ViperEkura	39766aa1dc	chore: 修改类名，优化导入顺序	2026-04-05 22:27:57 +08:00
ViperEkura	fc278d17ab	feat: 实现模型动态注册机制	2026-04-05 19:38:12 +08:00
ViperEkura	b531232a9b	style: 修改为显式导入	2026-04-04 16:02:49 +08:00
ViperEkura	0852b852f8	refactor: 优化参数传递，清理导入样式	2026-04-03 22:06:32 +08:00
ViperEkura	2e009cf59a	chore: 更新项目名称	2026-03-31 09:34:11 +08:00

27 Commits