AstrAI

Commit Graph

Author	SHA1	Message	Date
ViperEkura	2a65c3314c	fix : 修复 created 时间戳、bin 多 shard 覆盖与文档遗漏 - openai.py/anthropic.py: created 从 0 改为 int(time.time()) - openai.py: ChatCompletionRequest 不支持参数非默认值时 warning - pipeline.py: bin 多 shard 使用子目录避免静默覆盖 - storage.py: MmapStore/detect_format 支持多 shard 聚合加载 - architecture.md: mermaid 类图新增 Pipeline 类 - preprocessing.md: 新增多 shard 输出布局与 Python API 示例 - protocol.py: docstring "6 methods" 改为 "5 methods"	2026-05-30 23:03:42 +08:00
ViperEkura	1c2ff05a6d	docs : 三轮深度验证修复文档与代码不一致 - architecture.md: 修正 unwrap_model 返回类型、Config Optional 标注、方法签名错误、类名错误 - training.md: 补充 on_error 回调、修正训练循环顺序、补全策略参数、model.safetensors - inference.md: 修正 GenerationRequest 参数顺序、async 语法、KVCache 描述、temperature 约束 - dataflow.md: 补充 Store.load/fetch 流程、修正可选参数默认值 - README/params: 多 GPU 示例补全 --parallel_mode、文档表补充 preprocessing.md - preprocessing.md: Chat 模式算法补全 BOS token 步骤	2026-05-30 21:41:06 +08:00
ViperEkura	69207e2c57	refactor : 基于声明式 JSON 配置的预处理管线重构 - 用工厂注册的 MaskBuilder（chat/instruction/text）替换硬编码的 _transform_* 方法 - mask 规则以 role-to-action 映射声明在配置中，与 chat_template 完全解耦 - 单次编码 + role-span 追踪替代两次编码 + 长度差计算 mask 的方式 - 支持多轮对话训练：所有 assistant 轮次参与训练，而非仅最后一轮 - 新建 astrai.preprocessing 包（builder.py + pipeline.py），删除 astrai/preprocess.py - CLI 精简为 --config 参数，所有参数通过 PipelineConfig JSON 配置 - 新增 PipelineConfig、InputConfig、ProcessingConfig、OutputConfig dataclass - 文档：assets/docs/preprocessing.md - 27 个测试覆盖 mask builder、pipeline、配置序列化、工厂注册	2026-05-30 20:45:09 +08:00

Author

SHA1

Message

Date

ViperEkura

2a65c3314c

fix : 修复 created 时间戳、bin 多 shard 覆盖与文档遗漏

- openai.py/anthropic.py: created 从 0 改为 int(time.time())
- openai.py: ChatCompletionRequest 不支持参数非默认值时 warning
- pipeline.py: bin 多 shard 使用子目录避免静默覆盖
- storage.py: MmapStore/detect_format 支持多 shard 聚合加载
- architecture.md: mermaid 类图新增 Pipeline 类
- preprocessing.md: 新增多 shard 输出布局与 Python API 示例
- protocol.py: docstring "6 methods" 改为 "5 methods"

2026-05-30 23:03:42 +08:00

ViperEkura

1c2ff05a6d

docs : 三轮深度验证修复文档与代码不一致

- architecture.md: 修正 unwrap_model 返回类型、Config Optional 标注、方法签名错误、类名错误
- training.md: 补充 on_error 回调、修正训练循环顺序、补全策略参数、model.safetensors
- inference.md: 修正 GenerationRequest 参数顺序、async 语法、KVCache 描述、temperature 约束
- dataflow.md: 补充 Store.load/fetch 流程、修正可选参数默认值
- README/params: 多 GPU 示例补全 --parallel_mode、文档表补充 preprocessing.md
- preprocessing.md: Chat 模式算法补全 BOS token 步骤

2026-05-30 21:41:06 +08:00

ViperEkura

69207e2c57

refactor : 基于声明式 JSON 配置的预处理管线重构

- 用工厂注册的 MaskBuilder（chat/instruction/text）替换硬编码的 _transform_* 方法
- mask 规则以 role-to-action 映射声明在配置中，与 chat_template 完全解耦
- 单次编码 + role-span 追踪替代两次编码 + 长度差计算 mask 的方式
- 支持多轮对话训练：所有 assistant 轮次参与训练，而非仅最后一轮
- 新建 astrai.preprocessing 包（builder.py + pipeline.py），删除 astrai/preprocess.py
- CLI 精简为 --config 参数，所有参数通过 PipelineConfig JSON 配置
- 新增 PipelineConfig、InputConfig、ProcessingConfig、OutputConfig dataclass
- 文档：assets/docs/preprocessing.md
- 27 个测试覆盖 mask builder、pipeline、配置序列化、工厂注册

2026-05-30 20:45:09 +08:00

3 Commits