AstrAI/tests/inference
ViperEkura 6d6ef99e66 perf: 消除 PagedCache.write 中的 position_ids GPU 同步,解码提速 15%
- CacheView.write 用 total_len - k.size(1) 推导 start_pos,替代 position_ids[0,0].item()

- 移除 GQA/MLA/DecoderBlock 中不再使用的 position_ids 参数

- PagedCache.write 参数 position_ids:Tensor → start_pos:int
2026-05-14 15:37:48 +08:00
..
conftest.py chore: 解耦 Executor/Scheduler/TaskManager,修复 stop 页泄漏,移除 ServerState 全局单例 2026-05-12 13:47:55 +08:00
test_cache.py perf: 消除 PagedCache.write 中的 position_ids GPU 同步,解码提速 15% 2026-05-14 15:37:48 +08:00
test_engine.py chore: 解耦 Executor/Scheduler/TaskManager,修复 stop 页泄漏,移除 ServerState 全局单例 2026-05-12 13:47:55 +08:00
test_sample.py test: inference 模块补全单元测试,cache/sample/engine/task 2026-05-12 12:17:57 +08:00
test_scheduler.py style: 重命名 test_scheduler_concurrency 为 test_scheduler 2026-05-12 12:24:36 +08:00
test_server.py chore: 解耦 Executor/Scheduler/TaskManager,修复 stop 页泄漏,移除 ServerState 全局单例 2026-05-12 13:47:55 +08:00
test_task.py test: inference 模块补全单元测试,cache/sample/engine/task 2026-05-12 12:17:57 +08:00