EleutherAI/gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Archived

/ 100

Emerging

Supports diverse attention mechanisms including local and linear attention variants, alongside mixture-of-experts and axial positional embeddings beyond standard GPT architectures. Built on mesh-tensorflow for distributed training across TPU and GPU clusters with both data and model parallelism, enabling efficient scaling to multi-billion parameter models. Includes pre-trained checkpoints (1.3B and 2.7B parameters) trained on The Pile dataset, compatible with HuggingFace Transformers for immediate inference.

8,286 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

8,286

Forks

963

Language

Python

License

MIT

Compare

gpt-neo and gpt-neox

Higher-rated alternatives

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron...

shibing624/textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...

AdityaNG/kan-gpt

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...

keith2018/TinyGPT

Tiny C++ LLM inference implementation from scratch

Explore Transformer Models

All categories Trending Transformer directory Insights