InternLM/xtuner
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Implements dropless FSDP training without expert parallelism for 200B+ MoE models, and supports 64k sequence lengths via memory optimization or DeepSpeed Ulysses sequence parallelism. Achieves higher throughput than traditional 3D parallelism for MoE scales above 200B, with optimized support for both NVIDIA GPUs and Ascend NPUs. Integrates with LMDeploy for inference and supports multimodal pre-training, supervised fine-tuning, and reinforcement learning algorithms like GRPO.
5,096 stars and 1,643 monthly downloads. Actively maintained with 72 commits in the last 30 days. Available on PyPI.
Stars
5,096
Forks
405
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Monthly downloads
1,643
Commits (30d)
72
Dependencies
15
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/InternLM/xtuner"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
AmanPriyanshu/GPT-OSS-MoE-ExpertFingerprinting
ExpertFingerprinting: Behavioral Pattern Analysis and Specialization Mapping of Experts in...
arm-education/Advanced-AI-Mixture-of-Experts
Hands-on course materials for ML engineers to implement and optimize Mixture of Experts models:...
SuperBruceJia/Awesome-Mixture-of-Experts
Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of...
rioyokotalab/optimal-sparsity
[ICLR 2026 Oral] Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
robinzixuan/FROST
[ICLR 2026] FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning