docs: 重构 README 结构,全文档添加目录导航
- README 新增 Getting Started 端到端流程,整合快速开始与演示,去重精简 - 中文 README 同步英文版结构,预处理配置改用 seq 策略 - inference.md 补充 SSE 流式格式、错误响应、/stats 端点文档 - params.md 扩展为 CLI 参考,覆盖 server/generate/preprocess 参数表 - dataflow.md 拆分 tokenization/format detection/backend 子节,新增流程图 - architecture/training/inference/preprocessing 均添加目录导航 - 移除 README CI badge
This commit is contained in:
parent
d096b6e29e
commit
8ab7564d02
147
README.md
147
README.md
|
|
@ -12,7 +12,6 @@
|
||||||
<img src="https://img.shields.io/github/v/release/ViperEkura/AstrAI?label=Release&color=76bad9" alt="release">
|
<img src="https://img.shields.io/github/v/release/ViperEkura/AstrAI?label=Release&color=76bad9" alt="release">
|
||||||
<img src="https://img.shields.io/github/stars/ViperEkura/AstrAI?style=flat&label=Stars&color=76bad9" alt="stars">
|
<img src="https://img.shields.io/github/stars/ViperEkura/AstrAI?style=flat&label=Stars&color=76bad9" alt="stars">
|
||||||
<img src="https://img.shields.io/github/forks/ViperEkura/AstrAI?style=flat&label=Forks&color=76bad9" alt="forks">
|
<img src="https://img.shields.io/github/forks/ViperEkura/AstrAI?style=flat&label=Forks&color=76bad9" alt="forks">
|
||||||
<img src="https://img.shields.io/github/actions/workflow/status/ViperEkura/AstrAI/tests.yml?label=CI&color=76bad9" alt="ci">
|
|
||||||
</div>
|
</div>
|
||||||
<br>
|
<br>
|
||||||
|
|
||||||
|
|
@ -29,7 +28,8 @@
|
||||||
## 📖 Table of Contents
|
## 📖 Table of Contents
|
||||||
|
|
||||||
- [Features](#features)
|
- [Features](#features)
|
||||||
- [Quick Start](#quick-start)
|
- [Getting Started](#getting-started)
|
||||||
|
- [Demo](#demo)
|
||||||
- [Documentation](#documentation)
|
- [Documentation](#documentation)
|
||||||
- [Contributing](#contributing)
|
- [Contributing](#contributing)
|
||||||
- [Community](#community)
|
- [Community](#community)
|
||||||
|
|
@ -50,33 +50,43 @@
|
||||||
- 🤗 **HuggingFace-Style API**: AutoModel/AutoTokenizer APIs inspired by HuggingFace for easy model and tokenizer loading.
|
- 🤗 **HuggingFace-Style API**: AutoModel/AutoTokenizer APIs inspired by HuggingFace for easy model and tokenizer loading.
|
||||||
- 🔌 **Dual API Compatibility**: Supports both OpenAI and Anthropic chat completion APIs out of the box.
|
- 🔌 **Dual API Compatibility**: Supports both OpenAI and Anthropic chat completion APIs out of the box.
|
||||||
|
|
||||||
### Quick Start
|
### Getting Started
|
||||||
|
|
||||||
#### Installation
|
End-to-end walkthrough in 5 steps:
|
||||||
|
|
||||||
|
**1. Install**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ViperEkura/AstrAI.git
|
git clone https://github.com/ViperEkura/AstrAI.git
|
||||||
cd AstrAI
|
cd AstrAI
|
||||||
pip install -e .
|
pip install -e .
|
||||||
|
# pip install -e ".[dev]" # optional: dev dependencies (pytest, ruff)
|
||||||
```
|
```
|
||||||
|
|
||||||
For development dependencies:
|
**2. Download model**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e ".[dev]"
|
python scripts/demo/download.py # downloads 1B checkpoint to params/
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Download Pre-trained Model
|
**3. Preprocess data**
|
||||||
|
|
||||||
Download pre-trained model weights (1B bilingual checkpoint) to `params/`:
|
Create `pretrain.json` (preprocessing config for `seq` strategy):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": 1,
|
||||||
|
"input": {"sections": [{"field": "text", "action": "train"}]},
|
||||||
|
"preprocessing": {"max_seq_len": 2048},
|
||||||
|
"output": {"storage_format": "bin"}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/demo/download.py
|
python scripts/tools/preprocess.py data/*.jsonl -o output/ -c pretrain.json
|
||||||
```
|
```
|
||||||
|
|
||||||
Or download manually from [HuggingFace](https://huggingface.co/ViperEk/KHAOSZ) into `params/`.
|
**4. Train**
|
||||||
|
|
||||||
#### Train a Model
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||||
|
|
@ -103,15 +113,54 @@ nohup python scripts/tools/train.py \
|
||||||
> out.log 2> err.log &
|
> out.log 2> err.log &
|
||||||
```
|
```
|
||||||
|
|
||||||
Full reference at [Parameter Guide](assets/docs/params.md).
|
**5. Serve & query**
|
||||||
|
|
||||||
#### Generate Text
|
```bash
|
||||||
|
# Terminal 1: start server
|
||||||
|
python scripts/tools/server.py --param_path ./params --device cuda
|
||||||
|
|
||||||
|
# Terminal 2: query
|
||||||
|
curl http://localhost:8000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":512}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Demo
|
||||||
|
|
||||||
|
Check out the demos in the `scripts/demo/` folder:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download model weights (required before running demos)
|
||||||
|
python scripts/demo/download.py # model → params/
|
||||||
|
|
||||||
|
# Interactive streaming chat (multi-turn, maintains history)
|
||||||
|
python scripts/demo/stream_chat.py
|
||||||
|
# Type your message after >>, type !exit to quit
|
||||||
|
|
||||||
|
# Batch generation (5 hardcoded prompts, non-streaming)
|
||||||
|
python scripts/demo/generate_batch.py
|
||||||
|
|
||||||
|
# Single-prompt autoregressive streaming
|
||||||
|
python scripts/demo/generate_ar.py
|
||||||
|
```
|
||||||
|
|
||||||
|
All generation demos use `temperature=0.8`, `top_p=0.95`, `top_k=50`, `max_tokens=2048` by default and require `params/` to contain model weights (run `download.py` first).
|
||||||
|
|
||||||
|
Watch a video walkthrough on [bilibili](https://www.bilibili.com/video/BV1fuLB6yEj6).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
See [Documentation](#documentation) for full references beyond the examples above.
|
||||||
|
|
||||||
|
#### Text Generation
|
||||||
|
|
||||||
|
Batch generation from a JSONL file:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/tools/generate.py \
|
python scripts/tools/generate.py \
|
||||||
--param_path /path/to/model \
|
--param_path ./params \
|
||||||
--input_json_file /path/to/input.jsonl \
|
--input_json_file input.jsonl \
|
||||||
--output_json_file /path/to/output.jsonl
|
--output_json_file output.jsonl
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Docker
|
#### Docker
|
||||||
|
|
@ -125,9 +174,6 @@ docker build -t astrai:latest .
|
||||||
# Run with GPU support
|
# Run with GPU support
|
||||||
docker run --gpus all -it astrai:latest
|
docker run --gpus all -it astrai:latest
|
||||||
|
|
||||||
# Run with specific GPUs
|
|
||||||
docker run --gpus '"device=0,1"' -it astrai:latest
|
|
||||||
|
|
||||||
# Run inference server
|
# Run inference server
|
||||||
docker run --gpus all -p 8000:8000 astrai:latest \
|
docker run --gpus all -p 8000:8000 astrai:latest \
|
||||||
python -m scripts.tools.server --port 8000 --device cuda
|
python -m scripts.tools.server --port 8000 --device cuda
|
||||||
|
|
@ -144,84 +190,37 @@ docker compose --profile cpu up -d
|
||||||
|
|
||||||
> **Note**: `--gpus all` is required for CUDA support. Without it, `torch.cuda.is_available()` will return `False`.
|
> **Note**: `--gpus all` is required for CUDA support. Without it, `torch.cuda.is_available()` will return `False`.
|
||||||
|
|
||||||
#### Start HTTP Server
|
#### HTTP API Examples
|
||||||
|
|
||||||
Start the inference server with OpenAI and Anthropic-compatible HTTP API:
|
Additional request examples beyond the [Getting Started](#getting-started) flow:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m scripts.tools.server --port 8000 --device cuda
|
|
||||||
```
|
|
||||||
|
|
||||||
Make requests:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# OpenAI-compatible
|
|
||||||
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"messages": [{"role": "user", "content": "Hello"}],
|
|
||||||
"max_tokens": 512
|
|
||||||
}'
|
|
||||||
|
|
||||||
# OpenAI-compatible streaming
|
# OpenAI-compatible streaming
|
||||||
curl -X POST http://localhost:8000/v1/chat/completions \
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{"messages":[{"role":"user","content":"Tell a story"}],"stream":true,"max_tokens":500}'
|
||||||
"messages": [{"role": "user", "content": "Tell a story"}],
|
|
||||||
"stream": true,
|
|
||||||
"max_tokens": 500
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Anthropic-compatible
|
# Anthropic-compatible
|
||||||
curl -X POST http://localhost:8000/v1/messages \
|
curl -X POST http://localhost:8000/v1/messages \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{"model":"astrai","system":"You are a helpful assistant.","messages":[{"role":"user","content":"Hello"}],"max_tokens":512}'
|
||||||
"model": "astrai",
|
|
||||||
"system": "You are a helpful assistant.",
|
|
||||||
"messages": [{"role": "user", "content": "Hello"}],
|
|
||||||
"max_tokens": 512
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Anthropic-compatible streaming with stop sequences
|
# Anthropic-compatible streaming with stop sequences
|
||||||
curl -X POST http://localhost:8000/v1/messages \
|
curl -X POST http://localhost:8000/v1/messages \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{"model":"astrai","messages":[{"role":"user","content":"Write a story"}],"max_tokens":500,"stream":true,"stop_sequences":["The end"]}'
|
||||||
"model": "astrai",
|
|
||||||
"messages": [{"role": "user", "content": "Write a story"}],
|
|
||||||
"max_tokens": 500,
|
|
||||||
"stream": true,
|
|
||||||
"stop_sequences": ["The end"]
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Health check
|
# Health check
|
||||||
curl http://localhost:8000/health
|
curl http://localhost:8000/health
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Demo
|
See [Inference Guide](assets/docs/inference.md) for SSE streaming format, error codes, and stats endpoint.
|
||||||
|
|
||||||
Check out the demos in the `scripts/demo/` folder:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Download model weights (required before running demos)
|
|
||||||
python scripts/demo/download.py
|
|
||||||
|
|
||||||
# Interactive streaming chat
|
|
||||||
python scripts/demo/stream_chat.py
|
|
||||||
|
|
||||||
# Batch generation
|
|
||||||
python scripts/demo/generate_batch.py
|
|
||||||
|
|
||||||
# Auto‑regressive generation
|
|
||||||
python scripts/demo/generate_ar.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Watch a video walkthrough on [bilibili](https://www.bilibili.com/video/BV1fuLB6yEj6).
|
|
||||||
|
|
||||||
### Documentation
|
### Documentation
|
||||||
|
|
||||||
| Document | Description |
|
| Document | Description |
|
||||||
|----------|-------------|
|
|----------|-------------|
|
||||||
| [Parameter Guide](./assets/docs/params.md) | Training & inference parameters |
|
| [CLI Reference](./assets/docs/params.md) | Parameters for all CLI tools (train, server, generate, preprocess) |
|
||||||
| [Architecture](./assets/docs/architecture.md) | System architecture, class diagram & design patterns |
|
| [Architecture](./assets/docs/architecture.md) | System architecture, class diagram & design patterns |
|
||||||
| [Training](./assets/docs/training.md) | Training loop, strategies & formulas |
|
| [Training](./assets/docs/training.md) | Training loop, strategies & formulas |
|
||||||
| [Inference](./assets/docs/inference.md) | KVCache, continuous batching, sampling & HTTP API |
|
| [Inference](./assets/docs/inference.md) | KVCache, continuous batching, sampling & HTTP API |
|
||||||
|
|
|
||||||
|
|
@ -18,7 +18,6 @@
|
||||||
<img src="https://img.shields.io/github/v/release/ViperEkura/AstrAI?label=Release&color=76bad9" alt="release">
|
<img src="https://img.shields.io/github/v/release/ViperEkura/AstrAI?label=Release&color=76bad9" alt="release">
|
||||||
<img src="https://img.shields.io/github/stars/ViperEkura/AstrAI?style=flat&label=Stars&color=76bad9" alt="stars">
|
<img src="https://img.shields.io/github/stars/ViperEkura/AstrAI?style=flat&label=Stars&color=76bad9" alt="stars">
|
||||||
<img src="https://img.shields.io/github/forks/ViperEkura/AstrAI?style=flat&label=Forks&color=76bad9" alt="forks">
|
<img src="https://img.shields.io/github/forks/ViperEkura/AstrAI?style=flat&label=Forks&color=76bad9" alt="forks">
|
||||||
<img src="https://img.shields.io/github/actions/workflow/status/ViperEkura/AstrAI/tests.yml?label=CI&color=76bad9" alt="ci">
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<br>
|
<br>
|
||||||
|
|
@ -35,7 +34,8 @@
|
||||||
## 📖 目录
|
## 📖 目录
|
||||||
|
|
||||||
- [特性](#特性)
|
- [特性](#特性)
|
||||||
- [快速开始](#快速开始)
|
- [快速上手](#快速上手)
|
||||||
|
- [演示](#演示)
|
||||||
- [文档](#文档)
|
- [文档](#文档)
|
||||||
- [贡献](#贡献)
|
- [贡献](#贡献)
|
||||||
- [社区](#社区)
|
- [社区](#社区)
|
||||||
|
|
@ -56,33 +56,43 @@
|
||||||
- 🤗 **HuggingFace 风格 API**: 类 HuggingFace 的 AutoModel/AutoTokenizer 接口,方便加载模型和分词器。
|
- 🤗 **HuggingFace 风格 API**: 类 HuggingFace 的 AutoModel/AutoTokenizer 接口,方便加载模型和分词器。
|
||||||
- 🔌 **双 API 兼容**: 同时支持 OpenAI 和 Anthropic 聊天补全 API,开箱即用。
|
- 🔌 **双 API 兼容**: 同时支持 OpenAI 和 Anthropic 聊天补全 API,开箱即用。
|
||||||
|
|
||||||
### 快速开始
|
### 快速上手
|
||||||
|
|
||||||
#### 安装
|
端到端演示,只需 5 步:
|
||||||
|
|
||||||
|
**1. 安装**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/ViperEkura/AstrAI.git
|
git clone https://github.com/ViperEkura/AstrAI.git
|
||||||
cd AstrAI
|
cd AstrAI
|
||||||
pip install -e .
|
pip install -e .
|
||||||
|
# pip install -e ".[dev]" # 可选:开发依赖(pytest, ruff)
|
||||||
```
|
```
|
||||||
|
|
||||||
安装开发依赖:
|
**2. 下载模型**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e ".[dev]"
|
python scripts/demo/download.py # 下载 1B 检查点到 params/
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 下载预训练模型
|
**3. 预处理数据**
|
||||||
|
|
||||||
下载预训练模型权重(1B 双语检查点)到 `params/` 目录:
|
创建 `pretrain.json`(`seq` 策略的预处理配置):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": 1,
|
||||||
|
"input": {"sections": [{"field": "text", "action": "train"}]},
|
||||||
|
"preprocessing": {"max_seq_len": 2048},
|
||||||
|
"output": {"storage_format": "bin"}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/demo/download.py
|
python scripts/tools/preprocess.py data/*.jsonl -o output/ -c pretrain.json
|
||||||
```
|
```
|
||||||
|
|
||||||
或从 [HuggingFace](https://huggingface.co/ViperEk/KHAOSZ) 手动下载放入 `params/`。
|
**4. 训练**
|
||||||
|
|
||||||
#### 训练模型
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||||
|
|
@ -109,15 +119,54 @@ nohup python scripts/tools/train.py \
|
||||||
> out.log 2> err.log &
|
> out.log 2> err.log &
|
||||||
```
|
```
|
||||||
|
|
||||||
完整参数列表见[参数说明](./params.md)。
|
**5. 启动服务并调用**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 终端 1:启动服务
|
||||||
|
python scripts/tools/server.py --param_path ./params --device cuda
|
||||||
|
|
||||||
|
# 终端 2:发起请求
|
||||||
|
curl http://localhost:8000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"messages":[{"role":"user","content":"你好"}],"max_tokens":512}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### 演示
|
||||||
|
|
||||||
|
查看 `scripts/demo/` 文件夹中的演示:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 下载模型权重(运行演示前必需)
|
||||||
|
python scripts/demo/download.py # model → params/
|
||||||
|
|
||||||
|
# 交互式流式聊天(多轮对话,保持历史记录)
|
||||||
|
python scripts/demo/stream_chat.py
|
||||||
|
# 在 >> 后输入消息,输入 !exit 退出
|
||||||
|
|
||||||
|
# 批量生成(5 条硬编码提示词,非流式)
|
||||||
|
python scripts/demo/generate_batch.py
|
||||||
|
|
||||||
|
# 单条提示词自回归流式生成
|
||||||
|
python scripts/demo/generate_ar.py
|
||||||
|
```
|
||||||
|
|
||||||
|
所有生成演示默认使用 `temperature=0.8`、`top_p=0.95`、`top_k=50`、`max_tokens=2048`,需要 `params/` 目录包含模型权重(请先运行 `download.py`)。
|
||||||
|
|
||||||
|
观看 [bilibili](https://www.bilibili.com/video/BV1fuLB6yEj6) 上的视频演示。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
更多选项请参考[文档](#文档)。
|
||||||
|
|
||||||
#### 文本生成
|
#### 文本生成
|
||||||
|
|
||||||
|
从 JSONL 文件批量生成:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python scripts/tools/generate.py \
|
python scripts/tools/generate.py \
|
||||||
--param_path /path/to/model \
|
--param_path ./params \
|
||||||
--input_json_file /path/to/input.jsonl \
|
--input_json_file input.jsonl \
|
||||||
--output_json_file /path/to/output.jsonl
|
--output_json_file output.jsonl
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Docker
|
#### Docker
|
||||||
|
|
@ -131,9 +180,6 @@ docker build -t astrai:latest .
|
||||||
# 启用 GPU 运行
|
# 启用 GPU 运行
|
||||||
docker run --gpus all -it astrai:latest
|
docker run --gpus all -it astrai:latest
|
||||||
|
|
||||||
# 指定特定 GPU
|
|
||||||
docker run --gpus '"device=0,1"' -it astrai:latest
|
|
||||||
|
|
||||||
# 运行推理服务
|
# 运行推理服务
|
||||||
docker run --gpus all -p 8000:8000 astrai:latest \
|
docker run --gpus all -p 8000:8000 astrai:latest \
|
||||||
python -m scripts.tools.server --port 8000 --device cuda
|
python -m scripts.tools.server --port 8000 --device cuda
|
||||||
|
|
@ -150,84 +196,37 @@ docker compose --profile cpu up -d
|
||||||
|
|
||||||
> **注意**: 必须使用 `--gpus all` 才能启用 CUDA 支持,否则 `torch.cuda.is_available()` 将返回 `False`。
|
> **注意**: 必须使用 `--gpus all` 才能启用 CUDA 支持,否则 `torch.cuda.is_available()` 将返回 `False`。
|
||||||
|
|
||||||
#### 启动 HTTP 服务
|
#### HTTP API 示例
|
||||||
|
|
||||||
启动推理服务器,支持 OpenAI 和 Anthropic 兼容的 HTTP API:
|
除[快速上手](#快速上手)流程外,更多请求示例:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m scripts.tools.server --port 8000 --device cuda
|
|
||||||
```
|
|
||||||
|
|
||||||
发起请求:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# OpenAI 兼容
|
|
||||||
curl -X POST http://localhost:8000/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"messages": [{"role": "user", "content": "你好"}],
|
|
||||||
"max_tokens": 512
|
|
||||||
}'
|
|
||||||
|
|
||||||
# OpenAI 兼容流式
|
# OpenAI 兼容流式
|
||||||
curl -X POST http://localhost:8000/v1/chat/completions \
|
curl -X POST http://localhost:8000/v1/chat/completions \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{"messages":[{"role":"user","content":"讲个故事"}],"stream":true,"max_tokens":500}'
|
||||||
"messages": [{"role": "user", "content": "讲个故事"}],
|
|
||||||
"stream": true,
|
|
||||||
"max_tokens": 500
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Anthropic 兼容
|
# Anthropic 兼容
|
||||||
curl -X POST http://localhost:8000/v1/messages \
|
curl -X POST http://localhost:8000/v1/messages \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{"model":"astrai","system":"你是一个乐于助人的助手。","messages":[{"role":"user","content":"你好"}],"max_tokens":512}'
|
||||||
"model": "astrai",
|
|
||||||
"system": "你是一个乐于助人的助手。",
|
|
||||||
"messages": [{"role": "user", "content": "你好"}],
|
|
||||||
"max_tokens": 512
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Anthropic 兼容流式并设置停止序列
|
# Anthropic 兼容流式并设置停止序列
|
||||||
curl -X POST http://localhost:8000/v1/messages \
|
curl -X POST http://localhost:8000/v1/messages \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{"model":"astrai","messages":[{"role":"user","content":"写个故事"}],"max_tokens":500,"stream":true,"stop_sequences":["结束"]}'
|
||||||
"model": "astrai",
|
|
||||||
"messages": [{"role": "user", "content": "写个故事"}],
|
|
||||||
"max_tokens": 500,
|
|
||||||
"stream": true,
|
|
||||||
"stop_sequences": ["结束"]
|
|
||||||
}'
|
|
||||||
|
|
||||||
# 健康检查
|
# 健康检查
|
||||||
curl http://localhost:8000/health
|
curl http://localhost:8000/health
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 演示
|
SSE 流式格式、错误码和统计端点详见[推理文档](./inference.md)。
|
||||||
|
|
||||||
查看 `scripts/demo/` 文件夹中的演示:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 下载模型权重(运行演示前必需)
|
|
||||||
python scripts/demo/download.py
|
|
||||||
|
|
||||||
# 交互式流式聊天
|
|
||||||
python scripts/demo/stream_chat.py
|
|
||||||
|
|
||||||
# 批量生成
|
|
||||||
python scripts/demo/generate_batch.py
|
|
||||||
|
|
||||||
# 自回归生成
|
|
||||||
python scripts/demo/generate_ar.py
|
|
||||||
```
|
|
||||||
|
|
||||||
观看 [bilibili](https://www.bilibili.com/video/BV1fuLB6yEj6) 上的视频演示。
|
|
||||||
|
|
||||||
### 文档
|
### 文档
|
||||||
|
|
||||||
| 文档 | 说明 |
|
| 文档 | 说明 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| [参数说明](./params.md) | 训练与推理参数配置 |
|
| [CLI 参考](./params.md) | 所有 CLI 工具参数(训练、服务、生成、预处理) |
|
||||||
| [架构文档](./architecture.md) | 系统架构、类图与设计模式 |
|
| [架构文档](./architecture.md) | 系统架构、类图与设计模式 |
|
||||||
| [训练文档](./training.md) | 训练循环、策略与公式 |
|
| [训练文档](./training.md) | 训练循环、策略与公式 |
|
||||||
| [推理文档](./inference.md) | KVCache、连续批处理、采样与 HTTP API |
|
| [推理文档](./inference.md) | KVCache、连续批处理、采样与 HTTP API |
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,12 @@
|
||||||
# AstrAI Architecture
|
# AstrAI Architecture
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [Class Diagram](#class-diagram) — Full Mermaid class diagram across 10+ namespaces
|
||||||
|
- [Module Overview](#module-overview) — Component inventory per module
|
||||||
|
- [Design Patterns](#design-patterns) — 13 documented patterns with classes
|
||||||
|
- [Core Relationships](#core-relationships) — 11 key inter-component relationships
|
||||||
|
|
||||||
## Class Diagram
|
## Class Diagram
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
|
|
|
||||||
|
|
@ -1,17 +1,58 @@
|
||||||
# Data Flow
|
# Data Flow
|
||||||
|
|
||||||
This document describes the data pipeline: from raw text to model input tensors.
|
This document describes the data pipeline: from raw text to model input tensors. For creating preprocessing configs, see [Preprocessing Guide](preprocessing.md).
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Data Preparation](#data-preparation) — tokenization, format detection, backends
|
||||||
|
- [Data Keys by Training Type](#data-keys-by-training-type)
|
||||||
|
- [Dataset Architecture](#dataset-architecture)
|
||||||
|
- [Sampler](#sampler)
|
||||||
|
- [DataLoader](#dataloader)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
```
|
```
|
||||||
Raw Text → AutoTokenizer → Token IDs → .h5/.bin → Store.load() → Store.fetch() → Dataset → Sampler → DataLoader → Training/Inference
|
JSONL Lines → Pipeline (mask builder) → Tokenized Tensors
|
||||||
|
↓
|
||||||
|
.h5 or .bin storage
|
||||||
|
↓
|
||||||
|
Store.load()
|
||||||
|
↓
|
||||||
|
Store.fetch(begin, end, keys)
|
||||||
|
↓
|
||||||
|
BaseDataset.__getitem__(idx)
|
||||||
|
↓
|
||||||
|
Sampler → DataLoader → Training / Inference
|
||||||
```
|
```
|
||||||
|
|
||||||
## Data Preparation
|
## Data Preparation
|
||||||
|
|
||||||
Raw text is tokenized via `AutoTokenizer.encode()` and saved as HDF5 (`.h5`) or binary (`.bin` + `meta.json`) files with keyed tensor groups.
|
Raw text is tokenized via `AutoTokenizer.encode()` and saved as HDF5 (`.h5`) or binary (`.bin` + `meta.json`) files with keyed tensor groups.
|
||||||
|
|
||||||
|
### Tokenization
|
||||||
|
|
||||||
|
The `Pipeline` reads JSONL lines, applies the mask builder (see [Preprocessing](preprocessing.md)), and produces flat token sequences:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Per JSONL line: messages → chat template → token IDs + loss mask
|
||||||
|
tokens = tokenizer.encode(rendered_text) # List[int]
|
||||||
|
loss_mask = [0, 0, 0, 1, 1, 1, 1, 1, 1] # 0=masked, 1=train
|
||||||
|
# Stored as flat tensors, packed with other lines by packing strategy
|
||||||
|
```
|
||||||
|
|
||||||
|
The output `meta.json` records the storage format, key names, dtype, total token count, and tensor shapes for each shard.
|
||||||
|
|
||||||
|
### Format Detection
|
||||||
|
|
||||||
|
`detect_format(load_path)` inspects the directory:
|
||||||
|
|
||||||
|
- If `*.h5` files exist → `"h5"` (HDF5 backend)
|
||||||
|
- If `*.bin` + `meta.json` files exist → `"bin"` (memory-mapped backend)
|
||||||
|
|
||||||
|
### Store Backends
|
||||||
|
|
||||||
Storage format is auto-detected by `detect_format()`; backends are dispatched via registry:
|
Storage format is auto-detected by `detect_format()`; backends are dispatched via registry:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
@ -19,7 +60,11 @@ StoreFactory.create("h5") → H5Store
|
||||||
StoreFactory.create("bin") → MmapStore
|
StoreFactory.create("bin") → MmapStore
|
||||||
```
|
```
|
||||||
|
|
||||||
H5 backend supports shared memory via `.share_memory_()`. Bin (mmap) uses OS page-cache sharing natively.
|
**H5Store**: Reads HDF5 files, supports `share_memory_()` for multi-process DataLoader workers (copies tensors to shared memory).
|
||||||
|
|
||||||
|
**MmapStore**: Memory-maps `.bin` files. OS page cache sharing is native — no explicit `share_memory_()` needed. Uses `torch.from_numpy(np.memmap(...))`.
|
||||||
|
|
||||||
|
Both backends normalise tensors into `Store._data[Dict[str, List[Tensor]]]` + `Store._cum[Dict[str, List[int]]]` (cumulative lengths for bisect-based indexing).
|
||||||
|
|
||||||
## Data Keys by Training Type
|
## Data Keys by Training Type
|
||||||
|
|
||||||
|
|
@ -61,4 +106,4 @@ DatasetFactory.load(train_type, load_path, window_size, stride=None, storage_typ
|
||||||
|
|
||||||
Standard PyTorch `DataLoader` with configurable `batch_size`, `num_workers`, `pin_memory`, `prefetch_factor`. Sampler produces indices; dataloader fetches tensor batches via `__getitem__`.
|
Standard PyTorch `DataLoader` with configurable `batch_size`, `num_workers`, `pin_memory`, `prefetch_factor`. Sampler produces indices; dataloader fetches tensor batches via `__getitem__`.
|
||||||
|
|
||||||
> Document Update Time: 2026-05-30
|
> Document Update Time: 2026-06-19
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,16 @@
|
||||||
# Inference
|
# Inference
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [KV Cache](#kv-cache)
|
||||||
|
- [KVCache System](#kvcache-system)
|
||||||
|
- [Continuous Batching](#continuous-batching)
|
||||||
|
- [Sampling](#sampling-strategy-pattern)
|
||||||
|
- [Protocol Handlers](#protocol-handlers-strategy-pattern)
|
||||||
|
- [Engine & GenerateResult](#engine--generateresult)
|
||||||
|
- [HTTP API](#http-api) — endpoints, SSE, errors, stats
|
||||||
|
- [Engine API](#engine-api)
|
||||||
|
|
||||||
## KV Cache
|
## KV Cache
|
||||||
|
|
||||||
At decode time, only the last query token matters. All previous K/V are cached to avoid recomputation:
|
At decode time, only the last query token matters. All previous K/V are cached to avoid recomputation:
|
||||||
|
|
@ -133,6 +144,92 @@ Supports `stop_sequences` and streaming via `event: content_block_delta`.
|
||||||
| `max_tokens` | Optional[int] | None | Max generation length |
|
| `max_tokens` | Optional[int] | None | Max generation length |
|
||||||
| `stream` | bool | False | Stream output |
|
| `stream` | bool | False | Stream output |
|
||||||
|
|
||||||
|
### SSE Streaming Format
|
||||||
|
|
||||||
|
**OpenAI** (`/v1/chat/completions`, `stream=true`):
|
||||||
|
|
||||||
|
```
|
||||||
|
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":...,"model":"astrai",
|
||||||
|
"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
|
||||||
|
|
||||||
|
data: {"id":"chatcmpl-...","object":"chat.completion.chunk",...,
|
||||||
|
"choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
|
||||||
|
|
||||||
|
data: {"id":"chatcmpl-...","object":"chat.completion.chunk",...,
|
||||||
|
"choices":[{"index":0,"delta":{},"finish_reason":"stop"}],
|
||||||
|
"usage":{"prompt_tokens":5,"completion_tokens":1,"total_tokens":6}}
|
||||||
|
|
||||||
|
data: [DONE]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Anthropic** (`/v1/messages`, `stream=true`):
|
||||||
|
|
||||||
|
```
|
||||||
|
event: message_start
|
||||||
|
data: {"type":"message_start","message":{"id":"msg_...","model":"astrai","role":"assistant",
|
||||||
|
"content":[],"stop_reason":null,...}}
|
||||||
|
|
||||||
|
event: content_block_start
|
||||||
|
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
|
||||||
|
|
||||||
|
event: content_block_delta
|
||||||
|
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
|
||||||
|
|
||||||
|
event: content_block_stop
|
||||||
|
data: {"type":"content_block_stop","index":0}
|
||||||
|
|
||||||
|
event: message_delta
|
||||||
|
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{...}}
|
||||||
|
|
||||||
|
event: message_stop
|
||||||
|
data: {"type":"message_stop"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Responses
|
||||||
|
|
||||||
|
All endpoints use standard HTTP status codes:
|
||||||
|
|
||||||
|
| Status | Meaning |
|
||||||
|
|--------|---------|
|
||||||
|
| 200 | Success |
|
||||||
|
| 400 | Invalid request (bad JSON, missing fields, validation error) |
|
||||||
|
| 405 | Method not allowed |
|
||||||
|
| 422 | Unprocessable entity (Pydantic validation) |
|
||||||
|
| 500 | Internal server error (model crash, OOM, scheduler failure) |
|
||||||
|
| 503 | Service unavailable (model not loaded, engine not ready) |
|
||||||
|
|
||||||
|
Error response body:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"error": {
|
||||||
|
"message": "Invalid request: max_tokens must be > 0",
|
||||||
|
"type": "invalid_request_error",
|
||||||
|
"code": 400
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stats Endpoint
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /stats
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"active_requests": 3,
|
||||||
|
"waiting_requests": 2,
|
||||||
|
"total_requests": 128,
|
||||||
|
"cache_usage": 0.45,
|
||||||
|
"tokens_generated": 10240
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`cache_usage` is the fraction of KV cache pages currently in use (0.0–1.0).
|
||||||
|
|
||||||
## Engine API
|
## Engine API
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
@ -149,4 +246,4 @@ async for token in engine.generate_async("Hello", ...): # -> AsyncGenerator[s
|
||||||
print(token)
|
print(token)
|
||||||
```
|
```
|
||||||
|
|
||||||
> Document Update Time: 2026-05-30
|
> Document Update Time: 2026-06-19
|
||||||
|
|
|
||||||
|
|
@ -1,4 +1,11 @@
|
||||||
# Parameter Documentation
|
# CLI Parameter Reference
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [Training Parameters](#training-parameters)
|
||||||
|
- [Inference Server](#inference-server-serverpy)
|
||||||
|
- [Generate](#generate-generatepy)
|
||||||
|
- [Preprocess](#preprocess-preprocesspy)
|
||||||
|
|
||||||
## Training Parameters
|
## Training Parameters
|
||||||
|
|
||||||
|
|
@ -122,4 +129,64 @@ nohup python scripts/tools/train.py \
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
> Document Update Time: 2026-05-24
|
## Inference Server (`server.py`)
|
||||||
|
|
||||||
|
| Parameter | Type | Default | Description |
|
||||||
|
|-----------|------|---------|-------------|
|
||||||
|
| `--host` | str | `0.0.0.0` | Host address |
|
||||||
|
| `--port` | int | `8000` | Port number |
|
||||||
|
| `--param_path` | path | `project_root/params` | Path to model parameters |
|
||||||
|
| `--device` | str | `cuda` | Device to load model on |
|
||||||
|
| `--dtype` | str | `bfloat16` | Model weights dtype (`bfloat16`, `float16`, `float32`) |
|
||||||
|
| `--max_batch_size` | int | `16` | Maximum batch size for continuous batching |
|
||||||
|
| `--reload` | flag | `False` | Enable auto-reload for development |
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
```bash
|
||||||
|
python scripts/tools/server.py --param_path ./params --device cuda --dtype bfloat16
|
||||||
|
```
|
||||||
|
|
||||||
|
See [Inference Guide](inference.md) for HTTP API documentation.
|
||||||
|
|
||||||
|
## Generate (`generate.py`)
|
||||||
|
|
||||||
|
| Parameter | Type | Default | Description |
|
||||||
|
|-----------|------|---------|-------------|
|
||||||
|
| `--param_path` | str | required | Path to the model directory |
|
||||||
|
| `--input_json_file` | str | required | Path to the input JSONL file |
|
||||||
|
| `--output_json_file` | str | required | Path to the output JSONL file |
|
||||||
|
| `--question_key` | str | `question` | Key for the question in input JSON |
|
||||||
|
| `--response_key` | str | `response` | Key for the response in output JSON |
|
||||||
|
| `--temperature` | float | `0.60` | Sampling temperature |
|
||||||
|
| `--top_k` | int | `30` | Top-k filtering |
|
||||||
|
| `--top_p` | float | `0.95` | Nucleus sampling threshold |
|
||||||
|
| `--batch_size` | int | `1` | Batch size for generation |
|
||||||
|
| `--max_tokens` | int | `2048` | Maximum tokens to generate |
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
```bash
|
||||||
|
python scripts/tools/generate.py \
|
||||||
|
--param_path ./params \
|
||||||
|
--input_json_file input.jsonl \
|
||||||
|
--output_json_file output.jsonl
|
||||||
|
```
|
||||||
|
|
||||||
|
## Preprocess (`preprocess.py`)
|
||||||
|
|
||||||
|
| Parameter | Type | Default | Description |
|
||||||
|
|-----------|------|---------|-------------|
|
||||||
|
| `input_files` | path(s) | required | Input JSONL file(s), supports glob (`data/*.jsonl`) |
|
||||||
|
| `--output_dir`, `-o` | path | required | Output directory for processed data |
|
||||||
|
| `--config`, `-c` | path | required | Preprocessing pipeline config (JSON) |
|
||||||
|
| `--num_workers` | int | `4` | Number of parallel workers |
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
```bash
|
||||||
|
python scripts/tools/preprocess.py data/*.jsonl -o output/ -c sft.json
|
||||||
|
```
|
||||||
|
|
||||||
|
See [Preprocessing Guide](preprocessing.md) for config file format and examples.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
> Document Update Time: 2026-06-19
|
||||||
|
|
@ -2,6 +2,17 @@
|
||||||
|
|
||||||
Declarative JSON-driven data preprocessing. One `SectionedMaskBuilder` handles all formats via `input.sections` (single-output) or `input.sources` (multi-output).
|
Declarative JSON-driven data preprocessing. One `SectionedMaskBuilder` handles all formats via `input.sections` (single-output) or `input.sources` (multi-output).
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [Philosophy](#philosophy)
|
||||||
|
- [Config Structure](#config-structure)
|
||||||
|
- [Quick Start](#quick-start) — SFT Chat, SFT Instruction, Pretrain, DPO, GRPO examples
|
||||||
|
- [Configuration Reference](#configuration-reference) — all fields
|
||||||
|
- [Mask Algorithm](#mask-algorithm)
|
||||||
|
- [Output Layout](#output-layout)
|
||||||
|
- [CLI](#cli)
|
||||||
|
- [Python API](#python-api)
|
||||||
|
|
||||||
## Philosophy
|
## Philosophy
|
||||||
|
|
||||||
| Component | Responsibility |
|
| Component | Responsibility |
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,18 @@
|
||||||
# Training
|
# Training
|
||||||
|
|
||||||
|
## Contents
|
||||||
|
|
||||||
|
- [Autoregression](#autoregression)
|
||||||
|
- [Causal Mask](#causal-mask)
|
||||||
|
- [Rotary Position Embedding (RoPE)](#rotary-position-embedding-rope)
|
||||||
|
- [Training Loop](#training-loop)
|
||||||
|
- [Strategies](#strategies) — SEQ, SFT, DPO, GRPO
|
||||||
|
- [LR Schedulers](#lr-schedulers)
|
||||||
|
- [Gradient Checkpointing](#gradient-checkpointing)
|
||||||
|
- [Checkpoint](#checkpoint)
|
||||||
|
- [TrainContextBuilder](#traincontextbuilder-builder-pattern)
|
||||||
|
- [Training CLI](#training-cli)
|
||||||
|
|
||||||
### Autoregression
|
### Autoregression
|
||||||
|
|
||||||
Given a token sequence, the model predicts the probability of the next token. Each generated token is appended to the input and fed back, repeating until an end-of-sequence token or max length.
|
Given a token sequence, the model predicts the probability of the next token. Each generated token is appended to the input and fed back, repeating until an end-of-sequence token or max length.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue