AstrAI

Go to file

ViperEkura 8ab7564d02 docs: 重构 README 结构，全文档添加目录导航 - README 新增 Getting Started 端到端流程，整合快速开始与演示，去重精简 - 中文 README 同步英文版结构，预处理配置改用 seq 策略 - inference.md 补充 SSE 流式格式、错误响应、/stats 端点文档 - params.md 扩展为 CLI 参考，覆盖 server/generate/preprocess 参数表 - dataflow.md 拆分 tokenization/format detection/backend 子节，新增流程图 - architecture/training/inference/preprocessing 均添加目录导航 - 移除 README CI badge		2026-06-19 13:53:22 +08:00
.github	docs: replace shields.io endpoint badges with github/ direct badges	2026-06-18 15:09:51 +08:00
assets	docs: 重构 README 结构，全文档添加目录导航	2026-06-19 13:53:22 +08:00
astrai	fix: 修复预处理流水线 4 个致命问题	2026-06-18 17:38:01 +08:00
scripts	feat: IFEval 使用 chat template 格式化 prompt，添加 model.eval()	2026-06-18 16:45:16 +08:00
tests	refactor : 清理工厂和配置系统中的死代码与冗余抽象	2026-06-07 11:39:50 +08:00
.dockerignore	build: 修改docker 构建流程	2026-04-10 11:25:00 +08:00
.gitattributes	ci: 优化 GitHub Actions 工作流	2026-04-05 22:40:16 +08:00
.gitignore	feat: 新增 Docker Compose 一键部署，支持 GPU/CPU 双模式	2026-05-09 11:57:46 +08:00
CONTRIBUTING.md	docs: 修复文档中过时的字段、签名和缺失的类	2026-06-18 18:49:46 +08:00
Dockerfile	fix: docker-compose UID/GID 添加默认值，修复 docker.sh logs 命令	2026-05-18 14:24:00 +08:00
LICENSE	Change license from Apache 2.0 to GPL v3.0	2026-02-22 21:20:34 +08:00
README.md	docs: 重构 README 结构，全文档添加目录导航	2026-06-19 13:53:22 +08:00
docker-compose.yml	fix: docker-compose UID/GID 添加默认值，修复 docker.sh logs 命令	2026-05-18 14:24:00 +08:00
pyproject.toml	fix: 修复工厂模式问题并增加chat-template设置	2026-04-04 12:05:05 +08:00

README.md

A lightweight Transformer training & inference framework

English • 中文 • Issue Tracker • Discussions • HuggingFace

English

Features

🚀 High Performance: Optimized for both training and inference with efficient parallelization.
🔧 Flexible: Support for seq/sft/dpo/grpo training, customizable model architectures.
💡 Easy to Use: Simple API with comprehensive examples and demos.
📦 Lightweight: Minimal dependencies, easy to deploy.
🔬 Research‑Friendly: Modular design, easy to experiment with new ideas.
🤗 HuggingFace-Style API: AutoModel/AutoTokenizer APIs inspired by HuggingFace for easy model and tokenizer loading.
🔌 Dual API Compatibility: Supports both OpenAI and Anthropic chat completion APIs out of the box.

Getting Started

End-to-end walkthrough in 5 steps:

1. Install

git clone https://github.com/ViperEkura/AstrAI.git
cd AstrAI
pip install -e .
# pip install -e ".[dev]"    # optional: dev dependencies (pytest, ruff)

2. Download model

python scripts/demo/download.py    # downloads 1B checkpoint to params/

3. Preprocess data

Create pretrain.json (preprocessing config for seq strategy):

{
    "version": 1,
    "input": {"sections": [{"field": "text", "action": "train"}]},
    "preprocessing": {"max_seq_len": 2048},
    "output": {"storage_format": "bin"}
}

python scripts/tools/preprocess.py data/*.jsonl -o output/ -c pretrain.json

4. Train

export CUDA_VISIBLE_DEVICES=0,1,2,3

nohup python scripts/tools/train.py \
    --nprocs=4 \
    --parallel_mode=ddp \
    --train_type=seq \
    --data_root_path=/path/to/dataset \
    --param_path=/path/to/model \
    --batch_per_device=4 \
    --grad_accum_steps=8 \
    --warmup_ratio=0.05 \
    --max_lr=1e-4 \
    --max_grad_norm=1.0 \
    --adamw_beta1=0.9 \
    --adamw_beta2=0.95 \
    --adamw_weight_decay=0.01 \
    --window_size=2048 \
    --ckpt_interval=10000 \
    --ckpt_dir=./checkpoint \
    --random_seed=3407 \
    --label_smoothing=0.05 \
    > out.log 2> err.log &

5. Serve & query

# Terminal 1: start server
python scripts/tools/server.py --param_path ./params --device cuda

# Terminal 2: query
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}],"max_tokens":512}'

Demo

Check out the demos in the scripts/demo/ folder:

# Download model weights (required before running demos)
python scripts/demo/download.py                      # model → params/

# Interactive streaming chat (multi-turn, maintains history)
python scripts/demo/stream_chat.py
# Type your message after >>, type !exit to quit

# Batch generation (5 hardcoded prompts, non-streaming)
python scripts/demo/generate_batch.py

# Single-prompt autoregressive streaming
python scripts/demo/generate_ar.py

All generation demos use temperature=0.8, top_p=0.95, top_k=50, max_tokens=2048 by default and require params/ to contain model weights (run download.py first).

Watch a video walkthrough on bilibili.

See Documentation for full references beyond the examples above.

Text Generation

Batch generation from a JSONL file:

python scripts/tools/generate.py \
    --param_path ./params \
    --input_json_file input.jsonl \
    --output_json_file output.jsonl

Docker

Build and run with Docker (recommended for GPU environments):

# Build image
docker build -t astrai:latest .

# Run with GPU support
docker run --gpus all -it astrai:latest

# Run inference server
docker run --gpus all -p 8000:8000 astrai:latest \
  python -m scripts.tools.server --port 8000 --device cuda

# Run with volume mount for data
docker run --gpus all -v /path/to/data:/data -it astrai:latest

# Docker Compose (GPU, default)
docker compose up -d

# Docker Compose (CPU only)
docker compose --profile cpu up -d

Note: --gpus all is required for CUDA support. Without it, torch.cuda.is_available() will return False.

HTTP API Examples

Additional request examples beyond the Getting Started flow:

# OpenAI-compatible streaming
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Tell a story"}],"stream":true,"max_tokens":500}'

# Anthropic-compatible
curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"astrai","system":"You are a helpful assistant.","messages":[{"role":"user","content":"Hello"}],"max_tokens":512}'

# Anthropic-compatible streaming with stop sequences
curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"astrai","messages":[{"role":"user","content":"Write a story"}],"max_tokens":500,"stream":true,"stop_sequences":["The end"]}'

# Health check
curl http://localhost:8000/health

See Inference Guide for SSE streaming format, error codes, and stats endpoint.

Documentation

Document	Description
CLI Reference	Parameters for all CLI tools (train, server, generate, preprocess)
Architecture	System architecture, class diagram & design patterns
Training	Training loop, strategies & formulas
Inference	KVCache, continuous batching, sampling & HTTP API
Data Flow	Data pipeline, storage backends & dataset architecture
Preprocessing	Declarative JSON-driven data preprocessing

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository.
Create a feature branch.
Commit your changes.
Open a Pull Request.

For major changes, please open an issue first to discuss what you would like to change.

Community

GitHub Issues: Issue Tracker
Discussions: GitHub Discussions
HuggingFace: Model Hub

License

This project is licensed under the GPL-3.0 License.

A lightweight Transformer framework designed for both high performance and ease of use.

README.md Unescape Escape