AstrAI/astrai/inference
ViperEkura 6d6ef99e66 perf: 消除 PagedCache.write 中的 position_ids GPU 同步,解码提速 15%
- CacheView.write 用 total_len - k.size(1) 推导 start_pos,替代 position_ids[0,0].item()

- 移除 GQA/MLA/DecoderBlock 中不再使用的 position_ids 参数

- PagedCache.write 参数 position_ids:Tensor → start_pos:int
2026-05-14 15:37:48 +08:00
..
__init__.py refactor: TaskManager 剥离页管理,STOP 移至 task.py 2026-05-11 14:04:31 +08:00
cache.py perf: 消除 PagedCache.write 中的 position_ids GPU 同步,解码提速 15% 2026-05-14 15:37:48 +08:00
engine.py chore: 解耦 Executor/Scheduler/TaskManager,修复 stop 页泄漏,移除 ServerState 全局单例 2026-05-12 13:47:55 +08:00
executor.py refactor: decode 按页分桶批处理,position_ids 改为 per-task 构建 2026-05-14 14:22:11 +08:00
sample.py refactor: TaskManager 剥离页管理,STOP 移至 task.py 2026-05-11 14:04:31 +08:00
scheduler.py refactor: decode 按页分桶批处理,position_ids 改为 per-task 构建 2026-05-14 14:22:11 +08:00
server.py chore: 解耦 Executor/Scheduler/TaskManager,修复 stop 页泄漏,移除 ServerState 全局单例 2026-05-12 13:47:55 +08:00
task.py chore: 解耦 Executor/Scheduler/TaskManager,修复 stop 页泄漏,移除 ServerState 全局单例 2026-05-12 13:47:55 +08:00