torchspec-project/TorchSpec
A PyTorch native library for training speculative decoding models
Decouples inference and training via a disaggregated pipeline that streams hidden states from vLLM or SGLang inference engines to distributed training workers through Mooncake's in-memory store, enabling independent scaling of each component. Integrates directly with PyTorch FSDP for distributed training, uses vLLM's Worker Extension API to avoid RPC serialization overhead, and supports vocabulary pruning with HuggingFace checkpoint conversion. Includes production examples for Qwen3, Kimi-K2.5, and MiniMax-M2.5 models with configurable training modes for resuming interrupted runs or continual training from existing weights.
Stars
32
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/torchspec-project/TorchSpec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
sgl-project/SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
structuredllm/syncode
Efficient and general syntactical decoding for Large Language Models
SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...
hao-ai-lab/JacobiForcing
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding