x-transformers and attn_res

These are **complements**: x-transformers provides a flexible, production-ready transformer framework with experimental features, while attn_res offers a specialized, minimal implementation of a specific architectural innovation (attention residuals with GQA) that could be integrated as a module within x-transformers' extensible design.

x-transformers
79
Verified
attn_res
48
Emerging
Maintenance 20/25
Adoption 15/25
Maturity 25/25
Community 19/25
Maintenance 13/25
Adoption 9/25
Maturity 18/25
Community 8/25
Stars: 5,808
Forks: 507
Downloads:
Commits (30d): 9
Language: Python
License: MIT
Stars: 8
Forks: 1
Downloads: 164
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
No risk flags

About x-transformers

lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Supports encoder-decoder, decoder-only (GPT), and encoder-only (BERT) architectures alongside vision transformers for image classification and multimodal tasks like image captioning and vision-language modeling. Implements experimental attention mechanisms including Flash Attention for memory-efficient training, persistent memory augmentation, and memory tokens, while offering fine-grained control over dropout strategies including stochastic depth and layer-wise dropout. Built as a PyTorch library with modular components (`TransformerWrapper`, `Encoder`, `Decoder`, `ViTransformerWrapper`) enabling flexible composition for tasks ranging from language modeling to vision-language understanding.

About attn_res

kyegomez/attn_res

A clean, single-file PyTorch implementation of Attention Residuals (Kimi Team, MoonshotAI, 2026), integrated with Grouped Query Attention (GQA), SwiGLU feed-forward networks, and Rotary Position Embeddings (RoPE).

Scores updated daily from GitHub, PyPI, and npm data. How scores work