lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

/ 100

Verified

Supports encoder-decoder, decoder-only (GPT), and encoder-only (BERT) architectures alongside vision transformers for image classification and multimodal tasks like image captioning and vision-language modeling. Implements experimental attention mechanisms including Flash Attention for memory-efficient training, persistent memory augmentation, and memory tokens, while offering fine-grained control over dropout strategies including stochastic depth and layer-wise dropout. Built as a PyTorch library with modular components (`TransformerWrapper`, `Encoder`, `Decoder`, `ViTransformerWrapper`) enabling flexible composition for tasks ranging from language modeling to vision-language understanding.

5,808 stars. Used by 6 other packages. Actively maintained with 9 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

5,808

Forks

507

Language

Python

License

MIT

Compare

x-transformers and simple-hierarchical-transformer x-transformers and attn_res x-transformers and Fast-Transformer x-transformers and TransformerX

Related models

kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

lucidrains/locoformer

LocoFormer - Generalist Locomotion via Long-Context Adaptation

helpmefindaname/transformer-smaller-training-vocab

Temporary remove unused tokens during training to save ram and speed.

Explore Transformer Models

All categories Trending Transformer directory Insights