Mixture Of Experts Llms Transformer Models

There are 19 mixture of experts llms models tracked. The highest-rated is EfficientMoE/MoE-Infinity at 43/100 with 288 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=mixture-of-experts-llms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 EfficientMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

43
Emerging
2 jaisidhsingh/pytorch-mixtures

One-stop solutions for Mixture of Expert modules in PyTorch.

42
Emerging
3 raymin0223/mixture_of_recursions

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive...

41
Emerging
4 thu-nics/MoA

[CoLM'25] The official implementation of the paper

39
Emerging
5 AviSoori1x/makeMoE

From scratch implementation of a sparse mixture of experts language model...

39
Emerging
6 CASE-Lab-UMD/Unified-MoE-Compression

The official implementation of the paper "Towards Efficient Mixture of...

37
Emerging
7 MoonshotAI/MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

37
Emerging
8 ByteDance-Seed/FlexPrefill

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse...

35
Emerging
9 efeslab/fiddler

[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration

35
Emerging
10 FareedKhan-dev/qwen3-MoE-from-scratch

A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch

33
Emerging
11 lliai/D2MoE

D^2-MoE: Delta Decompression for MoE-based LLMs Compression

30
Emerging
12 SkyworkAI/MoE-plus-plus

[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with...

29
Experimental
13 dmis-lab/Monet

[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers

28
Experimental
14 CASE-Lab-UMD/Router-Tuning-Mixture-of-Depths

The open-source Mixture of Depths code and the official implementation of...

27
Experimental
15 cmu-flame/FLAME-MoE

Official repository for FLAME-MoE: A Transparent End-to-End Research...

26
Experimental
16 UNITES-Lab/HEXA-MoE

Official code for the paper "HEXA-MoE: Efficient and Heterogeneous-Aware MoE...

17
Experimental
17 Spico197/MoE-SFT

🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction...

16
Experimental
18 RoyZry98/T-REX-Pytorch

[Arxiv 2025] Official code for T-REX: Mixture-of-Rank-One-Experts with...

14
Experimental
19 zhongshsh/MoExtend

ACL 2024 (SRW), Official Codebase of our Paper: "MoExtend: Tuning New...

14
Experimental