InternLM/xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

/ 100

Verified

Implements dropless FSDP training without expert parallelism for 200B+ MoE models, and supports 64k sequence lengths via memory optimization or DeepSpeed Ulysses sequence parallelism. Achieves higher throughput than traditional 3D parallelism for MoE scales above 200B, with optimized support for both NVIDIA GPUs and Ascend NPUs. Integrates with LMDeploy for inference and supports multimodal pre-training, supervised fine-tuning, and reinforcement learning algorithms like GRPO.

5,096 stars and 1,643 monthly downloads. Actively maintained with 72 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25

Adoption 17 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

5,096

Forks

405

Language

Python

License

Apache-2.0

Related tools

AmanPriyanshu/GPT-OSS-MoE-ExpertFingerprinting

ExpertFingerprinting: Behavioral Pattern Analysis and Specialization Mapping of Experts in...

arm-education/Advanced-AI-Mixture-of-Experts

Hands-on course materials for ML engineers to implement and optimize Mixture of Experts models:...

SuperBruceJia/Awesome-Mixture-of-Experts

Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of...

rioyokotalab/optimal-sparsity

[ICLR 2026 Oral] Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

robinzixuan/FROST

[ICLR 2026] FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning

Explore LLM Tools

All categories Trending LLM Tool directory Insights