EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

58
/ 100
Established

Supports distributed training via 3D parallelism (tensor, pipeline, and data) with ZeRO optimization, enabling efficient scaling across heterogeneous hardware including AWS, supercomputers (Summit, Frontier, LUMI), and AMD MI250X GPUs. Features modern architectural innovations like rotary/ALiBi positional embeddings, Flash Attention 2, and Mixture-of-Experts, with preset configs for Pythia, PaLM, Falcon, and LLaMA. Integrates seamlessly with Hugging Face ecosystem (tokenizers, transformers), supports preference learning (DPO, KTO), and connects to monitoring platforms (WandB, Comet ML) and the Language Model Evaluation Harness.

7,399 stars.

No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

7,399

Forks

1,100

Language

Python

License

Apache-2.0

Last pushed

Feb 03, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/EleutherAI/gpt-neox"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.