release : v1.3.7
Features - FSDP parallel backend with zero-redundancy sharded training - LoRA fine-tuning module with low-rank adapter injection and persistence - NTK-Aware RoPE dynamic scaling, extending context window limit - MMLU evaluation script for standardized model knowledge assessment - load_json/load_safetensors broadcast mechanism for cross-node distributed loading Refactors - Storage layer refactored to Store pattern, removed Fetcher layer, supporting multi-segment data with explicit length - Training backend refactored to Executor pattern (none/ddp/fsdp), decoupling parallel logic - Inference protocol layer refactored to Strategy/Builder pattern with independent OpenAI/Anthropic responders - Unified serialization layer, eliminating scattered I/O paths - Removed JSONStore from data pipeline, unified to H5/Bin dual format - Simplified _disable_random_init, moved scheduler into sync block - Removed -> None return annotations, split FSDP parameters Fixes - Disabled DDP static_graph to prevent no_sync/backward conflict under PyTorch 2.7.1 - Checkpoint resume restores optimizer/scheduler state and sampler remaining length - Unwrap DDP/FSDP on checkpoint save to avoid module. prefix - start_epoch/start_batch determined by user args, no longer overridden by checkpoint - Left padding in perplexity.py causing incorrect PPL with batch>1 - Storage multi-segment bug, switched JSON to JSONL - Early abort on task_extend failure after decode, notify waiting tasks on scheduler crash Docs - Synced architecture/training/inference/dataflow/params docs to actual code Tests - Completed inference protocol layer unit test coverage - Added LoRA module tests - Filled storage layer test gaps
This commit is contained in:
parent
b37c3d000c
commit
a3275423a4
|
|
@ -1,4 +1,4 @@
|
|||
__version__ = "1.3.6"
|
||||
__version__ = "1.3.7"
|
||||
__author__ = "ViperEkura"
|
||||
|
||||
from astrai.config import (
|
||||
|
|
|
|||
Loading…
Reference in New Issue