torchspec-project/TorchSpec

A PyTorch native library for training speculative decoding models

/ 100

Emerging

Decouples inference and training via a disaggregated pipeline that streams hidden states from vLLM or SGLang inference engines to distributed training workers through Mooncake's in-memory store, enabling independent scaling of each component. Integrates directly with PyTorch FSDP for distributed training, uses vLLM's Worker Extension API to avoid RPC serialization overhead, and supports vocabulary pruning with HuggingFace checkpoint conversion. Includes production examples for Qwen3, Kimi-K2.5, and MiniMax-M2.5 models with configurable training modes for resuming interrupted runs or continual training from existing weights.

No Package No Dependents

Maintenance 13 / 25

Adoption 7 / 25

Maturity 11 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Compare

TorchSpec and SpecForge

Higher-rated alternatives

sgl-project/SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

structuredllm/syncode

Efficient and general syntactical decoding for Large Language Models

SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

romsto/Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan...

hao-ai-lab/JacobiForcing

Jacobi Forcing: Fast and Accurate Diffusion-style Decoding

Explore Transformer Models

All categories Trending Transformer directory Insights