EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Supports distributed training via 3D parallelism (tensor, pipeline, and data) with ZeRO optimization, enabling efficient scaling across heterogeneous hardware including AWS, supercomputers (Summit, Frontier, LUMI), and AMD MI250X GPUs. Features modern architectural innovations like rotary/ALiBi positional embeddings, Flash Attention 2, and Mixture-of-Experts, with preset configs for Pythia, PaLM, Falcon, and LLaMA. Integrates seamlessly with Hugging Face ecosystem (tokenizers, transformers), supports preference learning (DPO, KTO), and connects to monitoring platforms (WandB, Comet ML) and the Language Model Evaluation Harness.
7,399 stars.
Stars
7,399
Forks
1,100
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/EleutherAI/gpt-neox"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
tabularis-ai/be_great
A novel approach for synthesizing tabular data using pretrained large language models
shibing624/textgen
TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...
AdityaNG/kan-gpt
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...
EleutherAI/gpt-neo
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
keith2018/TinyGPT
Tiny C++ LLM inference implementation from scratch