AstrAI

Commit Graph

Author	SHA1	Message	Date
ViperEkura	3583c46b66	feat: 推理引擎前缀缓存（KV cache 复用） - cache.py: 新增模块级 page_hash() 多项式滚动哈希函数；PagedCache 新增 record_page/lookup_prefix/inc_ref，free() 自动清理哈希映射 - scheduler.py: Task 新增 _prefix_cached_tokens；_refill_active_batch 先查缓存命中页(inc_ref)再分配剩余页；合并 _execute_prefill 为单一方法，按 (prompt_len, start_pos) 分组批量执行全量/部分 prefill； _record_page_hashes 注册完整页哈希；修复 device/dtype 默认值从硬编码改为 None（自动检测模型设备） - test: mock model 补充 dtype/device 适配自动检测	2026-05-09 23:53:57 +08:00
ViperEkura	30cc2d67a4	refactor: 分页 KV cache 替换固定 slot，删除 PrefixCache 及相关死代码 - 用 PagedCache + CacheView 替换固定 slot 式 KV cache，attention 层只通过 page_table 间接索引 - 删除 PrefixCache（radix tree）及 scheduler 中所有 prefix cache 命中/插入/释放逻辑 - 删除无用函数：pin、version、free_count、_mark_seq_mask 及 seq_mask 分配 - 修复 write 在多页 prefill 时 offset 为负导致 chunk 计算错误 - _make_page_table_tensor 改用 list 拼接一次 tensor，去掉逐元素赋值 - 清理 model 接口参数：kv_cache, slot_indices → paged_cache（CacheView） - 精简 docstring 为单行，删除冗余 section 注释和旧代码 - 修复 test_scheduler_concurrency.py 缺少 import pytest	2026-05-08 20:44:05 +08:00
ViperEkura	44d7a4e959	refactor: 设计模式优化 inference 模块导入结构 - 新建 cache.py：SlotAllocator 对象池 + PrefixCacheManager - 新建 sampling.py：Temperature/TopK/TopP 可组合策略 - TaskStatus 改用 Enum，GenerationParams 值对象模式 - _STOP 移至 cache.py，解除 engine→scheduler 轻量耦合 - 更新测试导入路径，ruff 格式检查通过	2026-05-08 16:57:57 +08:00

Author

SHA1

Message

Date

ViperEkura

3583c46b66

feat: 推理引擎前缀缓存（KV cache 复用）

- cache.py: 新增模块级 page_hash() 多项式滚动哈希函数；PagedCache 新增
  record_page/lookup_prefix/inc_ref，free() 自动清理哈希映射
- scheduler.py: Task 新增 _prefix_cached_tokens；_refill_active_batch 先查
  缓存命中页(inc_ref)再分配剩余页；合并 _execute_prefill 为单一方法，
  按 (prompt_len, start_pos) 分组批量执行全量/部分 prefill；
  _record_page_hashes 注册完整页哈希；修复 device/dtype 默认值从硬编码
  改为 None（自动检测模型设备）
- test: mock model 补充 dtype/device 适配自动检测

2026-05-09 23:53:57 +08:00

ViperEkura

30cc2d67a4

refactor: 分页 KV cache 替换固定 slot，删除 PrefixCache 及相关死代码

- 用 PagedCache + CacheView 替换固定 slot 式 KV cache，attention 层只通过 page_table 间接索引
- 删除 PrefixCache（radix tree）及 scheduler 中所有 prefix cache 命中/插入/释放逻辑
- 删除无用函数：pin、version、free_count、_mark_seq_mask 及 seq_mask 分配
- 修复 write 在多页 prefill 时 offset 为负导致 chunk 计算错误
- _make_page_table_tensor 改用 list 拼接一次 tensor，去掉逐元素赋值
- 清理 model 接口参数：kv_cache, slot_indices → paged_cache（CacheView）
- 精简 docstring 为单行，删除冗余 section 注释和旧代码
- 修复 test_scheduler_concurrency.py 缺少 import pytest

2026-05-08 20:44:05 +08:00

ViperEkura

44d7a4e959

refactor: 设计模式优化 inference 模块导入结构

- 新建 cache.py：SlotAllocator 对象池 + PrefixCacheManager

- 新建 sampling.py：Temperature/TopK/TopP 可组合策略

- TaskStatus 改用 Enum，GenerationParams 值对象模式

- _STOP 移至 cache.py，解除 engine→scheduler 轻量耦合

- 更新测试导入路径，ruff 格式检查通过

2026-05-08 16:57:57 +08:00

3 Commits