Attention Mechanism Implementations ML Frameworks

Implementations and tutorials of attention layers, attention mechanisms, and self-attention architectures for neural networks. Does NOT include broader transformer architectures, vision models, or applications that use attention as a component without focusing on the mechanism itself.

There are 84 attention mechanism implementations frameworks tracked. 2 score above 50 (established tier). The highest-rated is philipperemy/keras-attention at 67/100 with 2,815 stars. 1 of the top 10 are actively maintained.

Get all 84 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=attention-mechanism-implementations&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	philipperemy/keras-attention Keras Attention Layer (Luong and Bahdanau scores).	67	Established	2,815	Python
2	tatp22/linformer-pytorch My take on a practical implementation of Linformer for Pytorch.	51	Established	422	Python
3	lucidrains/fast-weight-attention Implementation of Fast Weight Attention	48	Emerging	22	Python
4	datalogue/keras-attention Visualizing RNNs using the attention mechanism	44	Emerging	750	Python
5	ematvey/hierarchical-attention-networks Document classification with Hierarchical Attention Networks in TensorFlow....	44	Emerging	467	Python
6	thushv89/attention_keras Keras Layer implementation of Attention for Sequential models	44	Emerging	444	Python
7	willGuimont/learnable_fourier_positional_encoding Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding	43	Emerging	55	Python
8	davidmascharka/tbd-nets PyTorch implementation of "Transparency by Design: Closing the Gap Between...	42	Emerging	345	Jupyter Notebook
9	soskek/attention_is_all_you_need Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.	42	Emerging	323	Jupyter Notebook
10	balavenkatesh3322/CV-pretrained-model A collection of computer vision pre-trained models.	41	Emerging	1,361	—
11	kyegomez/FlashMHA An simple pytorch implementation of Flash MultiHead Attention	41	Emerging	22	Jupyter Notebook
12	brandokoch/attention-is-all-you-need-paper Original transformer paper: Implementation of Vaswani, Ashish, et al....	41	Emerging	243	Jupyter Notebook
13	kushalj001/pytorch-question-answering Important paper implementations for Question Answering using PyTorch	40	Emerging	269	Jupyter Notebook
14	tlatkowski/multihead-siamese-nets Implementation of Siamese Neural Networks built upon multihead attention...	40	Emerging	183	Jupyter Notebook
15	tensorflow/similarity TensorFlow Similarity is a python package focused on making similarity...	38	Emerging	1,024	Python
16	Ugenteraan/Deep_Hierarchical_Classification PyTorch Implementation of Deep Hierarchical Classification for Category...	37	Emerging	99	Python
17	rockerBOO/lora-inspector LoRA (Low-Rank Adaptation) inspector for Stable Diffusion	37	Emerging	103	Python
18	macournoyer/neuralconvo Neural conversational model in Torch	36	Emerging	775	Lua
19	Zhenye-Na/DA-RNN 📃 𝖀𝖓𝖔𝖋𝖋𝖎𝖈𝖎𝖆𝖑 PyTorch Implementation of DA-RNN (arXiv:1704.02971)	36	Emerging	424	Jupyter Notebook
20	EdGENetworks/attention-networks-for-classification Hierarchical Attention Networks for Document Classification in PyTorch	36	Emerging	608	Jupyter Notebook
21	opengeos/earthformer A Python package for Earth forecasting transformer	36	Emerging	91	Python
22	Rishit-dagli/Nystromformer An implementation of the Nyströmformer, using Nystrom method to approximate...	36	Emerging	58	Python
23	lsdefine/attention-is-all-you-need-keras A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need	36	Emerging	715	Python
24	rentainhe/visualization a collection of visualization function	35	Emerging	449	Python
25	poloclub/dodrio Exploring attention weights in transformer-based models with linguistic knowledge.	35	Emerging	370	Svelte
26	kyegomez/AoA-torch Implementation of Attention on Attention in Zeta	35	Emerging	5	Python
27	szagoruyko/attention-transfer Improving Convolutional Networks via Attention Transfer (ICLR 2017)	35	Emerging	1,466	Jupyter Notebook
28	cbaziotis/neat-vision Neat (Neural Attention) Vision, is a visualization tool for the attention...	34	Emerging	251	Vue
29	tatp22/multidim-positional-encoding An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow	33	Emerging	615	Python
30	davidsvy/cosformer-pytorch Unofficial PyTorch implementation of the paper "cosFormer: Rethinking...	33	Emerging	44	Jupyter Notebook
31	sara-nl/attention-sampling-pytorch This is a PyTorch implementation of the paper: "Processing Megapixel Images...	33	Emerging	41	Python
32	soobinseo/Attentive-Neural-Process A Pytorch Implementation of Attentive Neural Process	32	Emerging	74	Jupyter Notebook
33	castorini/MP-CNN-Torch Multi-Perspective Convolutional Neural Networks for modeling textual...	32	Emerging	105	Lua
34	pandeykartikey/Hierarchical-Attention-Network Implementation of Hierarchical Attention Networks in PyTorch	31	Emerging	129	Jupyter Notebook
35	MurrellGroup/InvariantPointAttention.jl Julia implementation of AlphaFold 2's Invariant Point Attention	30	Emerging	6	Julia
36	kyegomez/ShallowFF Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward...	30	Emerging	12	Python
37	Saquib764/omini-kontext An inference and training framework for multiple image input in Flux Kontext dev	28	Experimental	438	Jupyter Notebook
38	GalacticExchange/pretrained Pretrained is the most complete and frequently updated list of pretrained...	28	Experimental	129	—
39	abcamiletto/mmit A CV library in python, design and experiment with models using any encoder...	27	Experimental	14	Python
40	Akrielz/vision_models_playground Playground for testing and implementing various Vision Models	27	Experimental	13	Jupyter Notebook
41	kyegomez/Tree-Attention-Torch An implementation of Tree-Attention in PyTorch because it's in JAX for some reason	26	Experimental	5	Python
42	Rishit-dagli/Compositional-Attention An implementation of Compositional Attention: Disentangling Search and...	26	Experimental	14	Python
43	billpsomas/efficient-probing This repo contains the official implementation of the ICLR 2026 paper...	26	Experimental	29	Python
44	esceptico/perceiver-io Unofficial implementation of Perceiver IO	26	Experimental	128	Python
45	SkBlaz/attviz Dissecting Transformers via attention visualization	25	Experimental	5	JavaScript
46	Lanerra/DWARF O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet...	25	Experimental	3	Python
47	tobna/TaylorShift This repository contains the code for the paper "TaylorShift: Shifting the...	24	Experimental	13	Python
48	Awni00/abstract_transformer This is the project repo associated with the paper "Disentangling and...	23	Experimental	6	Jupyter Notebook
49	m-a-n-i-f-e-s-t/power-attention Attention Kernels for Symmetric Power Transformers	23	Experimental	129	—
50	sumo43/miniformer Minimal Transformer re-implementation inspired by minGPT. Can be used as a...	22	Experimental	1	Python
51	anmolg1997/LoRA-Factory LoRA adapter lifecycle platform — DAG pipelines,...	22	Experimental	—	Python
52	mzuhair9933/PoPE-pytorch ⚙️ Implement polar coordinate positional embedding in PyTorch for efficient...	22	Experimental	—	Python
53	Mogalina/transformer Minimal Transformer implementation in pure C based on the architecture from...	22	Experimental	—	C
54	EricLBuehler/PerceiverIO-Classifier A classifier based on PerceiverIO	21	Experimental	8	Jupyter Notebook
55	kyegomez/CT Implementation of the attention and transformer from "Building Blocks for a...	21	Experimental	8	Python
56	Rooooyy/HiTIN Code for ACL 2023 paper "HiTIN: Hierarchy-aware Tree Isomorphism Network for...	20	Experimental	38	Python
57	TiagoFilipeSousaGoncalves/survey-attention-medical-imaging Implementation of the paper "A survey on attention mechanisms for medical...	20	Experimental	13	Python
58	AlphafromZion/lora-lab LoRA Training Config Generator — optimal configs for SDXL, FLUX,...	19	Experimental	—	HTML
59	BobMcDear/attention-in-vision PyTorch implementation of popular attention mechanisms in vision	17	Experimental	19	Python
60	MaitySubhajit/KArAt Kolmogorov-Arnold Attention: Is Learnable Attention Better for Vision Transformers?	17	Experimental	15	Python
61	hrbigelow/transformer-aiayn The Transformer from "Attention is All You Need"	16	Experimental	2	Python
62	btrojan-official/HypeLoRA HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model...	16	Experimental	12	Python
63	Iro96/Carbon Carbon is a pure C++ Transformer framework inspired by GPT, featuring...	16	Experimental	1	C++
64	ccfco/External-Attention-tensorflow 🍀 Tensorflow implementation of various Attention Mechanisms, MLP,...	16	Experimental	41	Python
65	IBM/DEFT Official pytorch code for "From PEFT to DEFT: Parameter Efficient Finetuning...	15	Experimental	7	Python
66	sinpoce/ai-trainer-lite 🤖 3步训练你的专属AI模型 \| 文本分类+图像分类+表格AutoML \| Gradio可视化界面 \| 无需GPU \| 无需机器学习背景	15	Experimental	1	Python
67	ross-sec/fractal_attention_analysis A mathematical framework for analyzing transformer attention mechanisms...	15	Experimental	—	Python
68	Nemesis-12/multihead-latent-attention Implementation of Multi-head Latent Attention (MLA) from DeepSeek-V2	15	Experimental	—	Python
69	cnygaard/FractalHTransformer Fractal Hierarchical Transformer: multi-resolution causal attention patterns...	14	Experimental	—	Python
70	ebrahimpichka/attn-PG-RL-tsp A PyTorch implementation of the attention-based Policy Gradient RL for...	14	Experimental	9	Jupyter Notebook
71	ghosthamlet/transformers-rs Rust Implemention of paper: Attention Is All You...	14	Experimental	7	Rust
72	externalPointerVariable/AttentionIsAllYouNeed Implementing Transformers from Scratch	13	Experimental	2	Jupyter Notebook
73	biswajitsahoo1111/D2L_Attention_Mechanisms_in_TF This repository contains Tensorflow 2 code for Attention Mechanisms chapter...	12	Experimental	13	Jupyter Notebook
74	SCCSMARTCODE/attention-is-all-you-need-from-scratch A complete implementation of the Transformer architecture from scratch,...	11	Experimental	2	Jupyter Notebook
75	ducnt2406/AI-Headshot Easy-to-use toolkit for training LoRA models with SimpleTuner, featuring a...	11	Experimental	2	Python
76	romizone/simulasiLLM 🧠 Interactive LLM Attention Simulation — Visualize how GPT-2 transformers...	11	Experimental	—	JavaScript
77	adi-mish/miniformer Miniformer is a lightweight PyTorch transformer library for researchers,...	11	Experimental	—	Python
78	vijaysai1102/polyglot-neural-architecture A multimodal deep learning project that integrates SQL, MongoDB, Graph, and...	11	Experimental	—	Python
79	priyanshujiiii/awesome-Attention Resources and references on solved and unsolved problems in attention mechanisms.	11	Experimental	—	—
80	nexus-4/self-attention-mechanism Implementation of self-attention mechanism based on the "Attention is all...	11	Experimental	—	Python
81	pointlander/bento An aware attention free simplified image transformer	10	Experimental	1	Go
82	TiagoFilipeSousaGoncalves/attention-mechanisms-healthcare Implementation of the paper "Preliminary Study on the Impact of Attention...	10	Experimental	1	Python
83	wanga90/halonet-pytorch About Implementation of the 😇 Attention layer from the paper, Scaling Local...	10	Experimental	1	Python
84	zhengqigao/hbsattn a high-performance Block Sparse Attention kernel in Triton	10	Experimental	3	Python

Comparisons in this category

keras-attention and attention_keras (67 vs 44) hierarchical-attention-networks and attention-networks-for-classification (44 vs 36) attention_is_all_you_need and attention-is-all-you-need-paper (42 vs 41)