- astrai/inference/scheduler.py: 有界队列 (max_queue_size) 拒绝满时入队抛 RuntimeError
-> 请求超时检测 (deadline + _abort_expired_tasks),超时任务 abort 释放页并通知回调
-> stop() 改为 drain 模式:等待活跃任务自然结束再强制清理
-> get_stats() 扩展 latency P50/P95/P99 + cache hit rate
- astrai/inference/engine.py: generate/generate_async 新增 timeout 参数
-> _generate_streaming/_generate_non_streaming 捕获 add_task 异常并清理
- astrai/inference/server.py: 新增 /metrics 端点 (Prometheus 格式)
-> chat completions 端点捕获 RuntimeError 返回 503
-> configure_server 传递 max_queue_size/request_timeout
- astrai/inference/cache.py: 新增 lookup_hits/lookup_misses 计数器
- tests/: fix stats key total_tasks -> total_requests
|
||
|---|---|---|
| .. | ||
| conftest.py | ||
| test_scheduler_concurrency.py | ||
| test_server.py | ||