EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

/ 100

Established

Supports distributed training via 3D parallelism (tensor, pipeline, and data) with ZeRO optimization, enabling efficient scaling across heterogeneous hardware including AWS, supercomputers (Summit, Frontier, LUMI), and AMD MI250X GPUs. Features modern architectural innovations like rotary/ALiBi positional embeddings, Flash Attention 2, and Mixture-of-Experts, with preset configs for Pythia, PaLM, Falcon, and LLaMA. Integrates seamlessly with Hugging Face ecosystem (tokenizers, transformers), supports preference learning (DPO, KTO), and connects to monitoring platforms (WandB, Comet ML) and the Language Model Evaluation Harness.

7,399 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

7,399

Forks

1,100

Language

Python

License

Apache-2.0

Compare

gpt-neox and gpt-neo

Related models

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

shibing624/textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...

AdityaNG/kan-gpt

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...

EleutherAI/gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

keith2018/TinyGPT

Tiny C++ LLM inference implementation from scratch

Explore Transformer Models

All categories Trending Transformer directory Insights