lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
Supports encoder-decoder, decoder-only (GPT), and encoder-only (BERT) architectures alongside vision transformers for image classification and multimodal tasks like image captioning and vision-language modeling. Implements experimental attention mechanisms including Flash Attention for memory-efficient training, persistent memory augmentation, and memory tokens, while offering fine-grained control over dropout strategies including stochastic depth and layer-wise dropout. Built as a PyTorch library with modular components (`TransformerWrapper`, `Encoder`, `Decoder`, `ViTransformerWrapper`) enabling flexible composition for tasks ranging from language modeling to vision-language understanding.
5,808 stars. Used by 6 other packages. Actively maintained with 9 commits in the last 30 days. Available on PyPI.
Stars
5,808
Forks
507
Language
Python
License
MIT
Category
Last pushed
Mar 27, 2026
Commits (30d)
9
Dependencies
8
Reverse dependents
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/lucidrains/x-transformers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
kanishkamisra/minicons
Utility for behavioral and representational analyses of Language Models
lucidrains/dreamer4
Implementation of Danijar's latest iteration for his Dreamer line of work
lucidrains/simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
lucidrains/locoformer
LocoFormer - Generalist Locomotion via Long-Context Adaptation
helpmefindaname/transformer-smaller-training-vocab
Temporary remove unused tokens during training to save ram and speed.