x-transformers and Fast-Transformer
These are ecosystem siblings where x-transformers provides a general-purpose transformer implementation framework, while Fast-Transformer offers a specialized alternative attention mechanism (additive attention) that could be integrated into or compared against x-transformers' modular architecture.
About x-transformers
lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
Supports encoder-decoder, decoder-only (GPT), and encoder-only (BERT) architectures alongside vision transformers for image classification and multimodal tasks like image captioning and vision-language modeling. Implements experimental attention mechanisms including Flash Attention for memory-efficient training, persistent memory augmentation, and memory tokens, while offering fine-grained control over dropout strategies including stochastic depth and layer-wise dropout. Built as a PyTorch library with modular components (`TransformerWrapper`, `Encoder`, `Decoder`, `ViTransformerWrapper`) enabling flexible composition for tasks ranging from language modeling to vision-language understanding.
About Fast-Transformer
Rishit-dagli/Fast-Transformer
An implementation of Additive Attention
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work