Attention Mechanism Implementations ML Frameworks
Implementations and tutorials of attention layers, attention mechanisms, and self-attention architectures for neural networks. Does NOT include broader transformer architectures, vision models, or applications that use attention as a component without focusing on the mechanism itself.
There are 84 attention mechanism implementations frameworks tracked. 2 score above 50 (established tier). The highest-rated is philipperemy/keras-attention at 67/100 with 2,815 stars. 1 of the top 10 are actively maintained.
Get all 84 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=attention-mechanism-implementations&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
philipperemy/keras-attention
Keras Attention Layer (Luong and Bahdanau scores). |
|
Established |
| 2 |
tatp22/linformer-pytorch
My take on a practical implementation of Linformer for Pytorch. |
|
Established |
| 3 |
lucidrains/fast-weight-attention
Implementation of Fast Weight Attention |
|
Emerging |
| 4 |
datalogue/keras-attention
Visualizing RNNs using the attention mechanism |
|
Emerging |
| 5 |
ematvey/hierarchical-attention-networks
Document classification with Hierarchical Attention Networks in TensorFlow.... |
|
Emerging |
| 6 |
thushv89/attention_keras
Keras Layer implementation of Attention for Sequential models |
|
Emerging |
| 7 |
willGuimont/learnable_fourier_positional_encoding
Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding |
|
Emerging |
| 8 |
davidmascharka/tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between... |
|
Emerging |
| 9 |
soskek/attention_is_all_you_need
Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer. |
|
Emerging |
| 10 |
balavenkatesh3322/CV-pretrained-model
A collection of computer vision pre-trained models. |
|
Emerging |
| 11 |
kyegomez/FlashMHA
An simple pytorch implementation of Flash MultiHead Attention |
|
Emerging |
| 12 |
brandokoch/attention-is-all-you-need-paper
Original transformer paper: Implementation of Vaswani, Ashish, et al.... |
|
Emerging |
| 13 |
kushalj001/pytorch-question-answering
Important paper implementations for Question Answering using PyTorch |
|
Emerging |
| 14 |
tlatkowski/multihead-siamese-nets
Implementation of Siamese Neural Networks built upon multihead attention... |
|
Emerging |
| 15 |
tensorflow/similarity
TensorFlow Similarity is a python package focused on making similarity... |
|
Emerging |
| 16 |
Ugenteraan/Deep_Hierarchical_Classification
PyTorch Implementation of Deep Hierarchical Classification for Category... |
|
Emerging |
| 17 |
rockerBOO/lora-inspector
LoRA (Low-Rank Adaptation) inspector for Stable Diffusion |
|
Emerging |
| 18 |
macournoyer/neuralconvo
Neural conversational model in Torch |
|
Emerging |
| 19 |
Zhenye-Na/DA-RNN
📃 𝖀𝖓𝖔𝖋𝖋𝖎𝖈𝖎𝖆𝖑 PyTorch Implementation of DA-RNN (arXiv:1704.02971) |
|
Emerging |
| 20 |
EdGENetworks/attention-networks-for-classification
Hierarchical Attention Networks for Document Classification in PyTorch |
|
Emerging |
| 21 |
opengeos/earthformer
A Python package for Earth forecasting transformer |
|
Emerging |
| 22 |
Rishit-dagli/Nystromformer
An implementation of the Nyströmformer, using Nystrom method to approximate... |
|
Emerging |
| 23 |
lsdefine/attention-is-all-you-need-keras
A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need |
|
Emerging |
| 24 |
rentainhe/visualization
a collection of visualization function |
|
Emerging |
| 25 |
poloclub/dodrio
Exploring attention weights in transformer-based models with linguistic knowledge. |
|
Emerging |
| 26 |
kyegomez/AoA-torch
Implementation of Attention on Attention in Zeta |
|
Emerging |
| 27 |
szagoruyko/attention-transfer
Improving Convolutional Networks via Attention Transfer (ICLR 2017) |
|
Emerging |
| 28 |
cbaziotis/neat-vision
Neat (Neural Attention) Vision, is a visualization tool for the attention... |
|
Emerging |
| 29 |
tatp22/multidim-positional-encoding
An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow |
|
Emerging |
| 30 |
davidsvy/cosformer-pytorch
Unofficial PyTorch implementation of the paper "cosFormer: Rethinking... |
|
Emerging |
| 31 |
sara-nl/attention-sampling-pytorch
This is a PyTorch implementation of the paper: "Processing Megapixel Images... |
|
Emerging |
| 32 |
soobinseo/Attentive-Neural-Process
A Pytorch Implementation of Attentive Neural Process |
|
Emerging |
| 33 |
castorini/MP-CNN-Torch
Multi-Perspective Convolutional Neural Networks for modeling textual... |
|
Emerging |
| 34 |
pandeykartikey/Hierarchical-Attention-Network
Implementation of Hierarchical Attention Networks in PyTorch |
|
Emerging |
| 35 |
MurrellGroup/InvariantPointAttention.jl
Julia implementation of AlphaFold 2's Invariant Point Attention |
|
Emerging |
| 36 |
kyegomez/ShallowFF
Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward... |
|
Emerging |
| 37 |
Saquib764/omini-kontext
An inference and training framework for multiple image input in Flux Kontext dev |
|
Experimental |
| 38 |
GalacticExchange/pretrained
Pretrained is the most complete and frequently updated list of pretrained... |
|
Experimental |
| 39 |
abcamiletto/mmit
A CV library in python, design and experiment with models using any encoder... |
|
Experimental |
| 40 |
Akrielz/vision_models_playground
Playground for testing and implementing various Vision Models |
|
Experimental |
| 41 |
kyegomez/Tree-Attention-Torch
An implementation of Tree-Attention in PyTorch because it's in JAX for some reason |
|
Experimental |
| 42 |
Rishit-dagli/Compositional-Attention
An implementation of Compositional Attention: Disentangling Search and... |
|
Experimental |
| 43 |
billpsomas/efficient-probing
This repo contains the official implementation of the ICLR 2026 paper... |
|
Experimental |
| 44 |
esceptico/perceiver-io
Unofficial implementation of Perceiver IO |
|
Experimental |
| 45 |
SkBlaz/attviz
Dissecting Transformers via attention visualization |
|
Experimental |
| 46 |
Lanerra/DWARF
O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet... |
|
Experimental |
| 47 |
tobna/TaylorShift
This repository contains the code for the paper "TaylorShift: Shifting the... |
|
Experimental |
| 48 |
Awni00/abstract_transformer
This is the project repo associated with the paper "Disentangling and... |
|
Experimental |
| 49 |
m-a-n-i-f-e-s-t/power-attention
Attention Kernels for Symmetric Power Transformers |
|
Experimental |
| 50 |
sumo43/miniformer
Minimal Transformer re-implementation inspired by minGPT. Can be used as a... |
|
Experimental |
| 51 |
anmolg1997/LoRA-Factory
LoRA adapter lifecycle platform — DAG pipelines,... |
|
Experimental |
| 52 |
mzuhair9933/PoPE-pytorch
⚙️ Implement polar coordinate positional embedding in PyTorch for efficient... |
|
Experimental |
| 53 |
Mogalina/transformer
Minimal Transformer implementation in pure C based on the architecture from... |
|
Experimental |
| 54 |
EricLBuehler/PerceiverIO-Classifier
A classifier based on PerceiverIO |
|
Experimental |
| 55 |
kyegomez/CT
Implementation of the attention and transformer from "Building Blocks for a... |
|
Experimental |
| 56 |
Rooooyy/HiTIN
Code for ACL 2023 paper "HiTIN: Hierarchy-aware Tree Isomorphism Network for... |
|
Experimental |
| 57 |
TiagoFilipeSousaGoncalves/survey-attention-medical-imaging
Implementation of the paper "A survey on attention mechanisms for medical... |
|
Experimental |
| 58 |
AlphafromZion/lora-lab
LoRA Training Config Generator — optimal configs for SDXL, FLUX,... |
|
Experimental |
| 59 |
BobMcDear/attention-in-vision
PyTorch implementation of popular attention mechanisms in vision |
|
Experimental |
| 60 |
MaitySubhajit/KArAt
Kolmogorov-Arnold Attention: Is Learnable Attention Better for Vision Transformers? |
|
Experimental |
| 61 |
hrbigelow/transformer-aiayn
The Transformer from "Attention is All You Need" |
|
Experimental |
| 62 |
btrojan-official/HypeLoRA
HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model... |
|
Experimental |
| 63 |
Iro96/Carbon
Carbon is a pure C++ Transformer framework inspired by GPT, featuring... |
|
Experimental |
| 64 |
ccfco/External-Attention-tensorflow
🍀 Tensorflow implementation of various Attention Mechanisms, MLP,... |
|
Experimental |
| 65 |
IBM/DEFT
Official pytorch code for "From PEFT to DEFT: Parameter Efficient Finetuning... |
|
Experimental |
| 66 |
sinpoce/ai-trainer-lite
🤖 3步训练你的专属AI模型 | 文本分类+图像分类+表格AutoML | Gradio可视化界面 | 无需GPU | 无需机器学习背景 |
|
Experimental |
| 67 |
ross-sec/fractal_attention_analysis
A mathematical framework for analyzing transformer attention mechanisms... |
|
Experimental |
| 68 |
Nemesis-12/multihead-latent-attention
Implementation of Multi-head Latent Attention (MLA) from DeepSeek-V2 |
|
Experimental |
| 69 |
cnygaard/FractalHTransformer
Fractal Hierarchical Transformer: multi-resolution causal attention patterns... |
|
Experimental |
| 70 |
ebrahimpichka/attn-PG-RL-tsp
A PyTorch implementation of the attention-based Policy Gradient RL for... |
|
Experimental |
| 71 |
ghosthamlet/transformers-rs
Rust Implemention of paper: Attention Is All You... |
|
Experimental |
| 72 |
externalPointerVariable/AttentionIsAllYouNeed
Implementing Transformers from Scratch |
|
Experimental |
| 73 |
biswajitsahoo1111/D2L_Attention_Mechanisms_in_TF
This repository contains Tensorflow 2 code for Attention Mechanisms chapter... |
|
Experimental |
| 74 |
SCCSMARTCODE/attention-is-all-you-need-from-scratch
A complete implementation of the Transformer architecture from scratch,... |
|
Experimental |
| 75 |
ducnt2406/AI-Headshot
Easy-to-use toolkit for training LoRA models with SimpleTuner, featuring a... |
|
Experimental |
| 76 |
romizone/simulasiLLM
🧠 Interactive LLM Attention Simulation — Visualize how GPT-2 transformers... |
|
Experimental |
| 77 |
adi-mish/miniformer
Miniformer is a lightweight PyTorch transformer library for researchers,... |
|
Experimental |
| 78 |
vijaysai1102/polyglot-neural-architecture
A multimodal deep learning project that integrates SQL, MongoDB, Graph, and... |
|
Experimental |
| 79 |
priyanshujiiii/awesome-Attention
Resources and references on solved and unsolved problems in attention mechanisms. |
|
Experimental |
| 80 |
nexus-4/self-attention-mechanism
Implementation of self-attention mechanism based on the "Attention is all... |
|
Experimental |
| 81 |
pointlander/bento
An aware attention free simplified image transformer |
|
Experimental |
| 82 |
TiagoFilipeSousaGoncalves/attention-mechanisms-healthcare
Implementation of the paper "Preliminary Study on the Impact of Attention... |
|
Experimental |
| 83 |
wanga90/halonet-pytorch
About Implementation of the 😇 Attention layer from the paper, Scaling Local... |
|
Experimental |
| 84 |
zhengqigao/hbsattn
a high-performance Block Sparse Attention kernel in Triton |
|
Experimental |