Attention Mechanism Implementations ML Frameworks

Implementations and tutorials of attention layers, attention mechanisms, and self-attention architectures for neural networks. Does NOT include broader transformer architectures, vision models, or applications that use attention as a component without focusing on the mechanism itself.

There are 84 attention mechanism implementations frameworks tracked. 2 score above 50 (established tier). The highest-rated is philipperemy/keras-attention at 67/100 with 2,815 stars. 1 of the top 10 are actively maintained.

Get all 84 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=attention-mechanism-implementations&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 philipperemy/keras-attention

Keras Attention Layer (Luong and Bahdanau scores).

67
Established
2 tatp22/linformer-pytorch

My take on a practical implementation of Linformer for Pytorch.

51
Established
3 lucidrains/fast-weight-attention

Implementation of Fast Weight Attention

48
Emerging
4 datalogue/keras-attention

Visualizing RNNs using the attention mechanism

44
Emerging
5 ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow....

44
Emerging
6 thushv89/attention_keras

Keras Layer implementation of Attention for Sequential models

44
Emerging
7 willGuimont/learnable_fourier_positional_encoding

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

43
Emerging
8 davidmascharka/tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between...

42
Emerging
9 soskek/attention_is_all_you_need

Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.

42
Emerging
10 balavenkatesh3322/CV-pretrained-model

A collection of computer vision pre-trained models.

41
Emerging
11 kyegomez/FlashMHA

An simple pytorch implementation of Flash MultiHead Attention

41
Emerging
12 brandokoch/attention-is-all-you-need-paper

Original transformer paper: Implementation of Vaswani, Ashish, et al....

41
Emerging
13 kushalj001/pytorch-question-answering

Important paper implementations for Question Answering using PyTorch

40
Emerging
14 tlatkowski/multihead-siamese-nets

Implementation of Siamese Neural Networks built upon multihead attention...

40
Emerging
15 tensorflow/similarity

TensorFlow Similarity is a python package focused on making similarity...

38
Emerging
16 Ugenteraan/Deep_Hierarchical_Classification

PyTorch Implementation of Deep Hierarchical Classification for Category...

37
Emerging
17 rockerBOO/lora-inspector

LoRA (Low-Rank Adaptation) inspector for Stable Diffusion

37
Emerging
18 macournoyer/neuralconvo

Neural conversational model in Torch

36
Emerging
19 Zhenye-Na/DA-RNN

📃 𝖀𝖓𝖔𝖋𝖋𝖎𝖈𝖎𝖆𝖑 PyTorch Implementation of DA-RNN (arXiv:1704.02971)

36
Emerging
20 EdGENetworks/attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch

36
Emerging
21 opengeos/earthformer

A Python package for Earth forecasting transformer

36
Emerging
22 Rishit-dagli/Nystromformer

An implementation of the Nyströmformer, using Nystrom method to approximate...

36
Emerging
23 lsdefine/attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need

36
Emerging
24 rentainhe/visualization

a collection of visualization function

35
Emerging
25 poloclub/dodrio

Exploring attention weights in transformer-based models with linguistic knowledge.

35
Emerging
26 kyegomez/AoA-torch

Implementation of Attention on Attention in Zeta

35
Emerging
27 szagoruyko/attention-transfer

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

35
Emerging
28 cbaziotis/neat-vision

Neat (Neural Attention) Vision, is a visualization tool for the attention...

34
Emerging
29 tatp22/multidim-positional-encoding

An implementation of 1D, 2D, and 3D positional encoding in Pytorch and TensorFlow

33
Emerging
30 davidsvy/cosformer-pytorch

Unofficial PyTorch implementation of the paper "cosFormer: Rethinking...

33
Emerging
31 sara-nl/attention-sampling-pytorch

This is a PyTorch implementation of the paper: "Processing Megapixel Images...

33
Emerging
32 soobinseo/Attentive-Neural-Process

A Pytorch Implementation of Attentive Neural Process

32
Emerging
33 castorini/MP-CNN-Torch

Multi-Perspective Convolutional Neural Networks for modeling textual...

32
Emerging
34 pandeykartikey/Hierarchical-Attention-Network

Implementation of Hierarchical Attention Networks in PyTorch

31
Emerging
35 MurrellGroup/InvariantPointAttention.jl

Julia implementation of AlphaFold 2's Invariant Point Attention

30
Emerging
36 kyegomez/ShallowFF

Zeta implemantion of "Rethinking Attention: Exploring Shallow Feed-Forward...

30
Emerging
37 Saquib764/omini-kontext

An inference and training framework for multiple image input in Flux Kontext dev

28
Experimental
38 GalacticExchange/pretrained

Pretrained is the most complete and frequently updated list of pretrained...

28
Experimental
39 abcamiletto/mmit

A CV library in python, design and experiment with models using any encoder...

27
Experimental
40 Akrielz/vision_models_playground

Playground for testing and implementing various Vision Models

27
Experimental
41 kyegomez/Tree-Attention-Torch

An implementation of Tree-Attention in PyTorch because it's in JAX for some reason

26
Experimental
42 Rishit-dagli/Compositional-Attention

An implementation of Compositional Attention: Disentangling Search and...

26
Experimental
43 billpsomas/efficient-probing

This repo contains the official implementation of the ICLR 2026 paper...

26
Experimental
44 esceptico/perceiver-io

Unofficial implementation of Perceiver IO

26
Experimental
45 SkBlaz/attviz

Dissecting Transformers via attention visualization

25
Experimental
46 Lanerra/DWARF

O(N) attention with a bounded inference KV cache. D4 Daubechies wavelet...

25
Experimental
47 tobna/TaylorShift

This repository contains the code for the paper "TaylorShift: Shifting the...

24
Experimental
48 Awni00/abstract_transformer

This is the project repo associated with the paper "Disentangling and...

23
Experimental
49 m-a-n-i-f-e-s-t/power-attention

Attention Kernels for Symmetric Power Transformers

23
Experimental
50 sumo43/miniformer

Minimal Transformer re-implementation inspired by minGPT. Can be used as a...

22
Experimental
51 anmolg1997/LoRA-Factory

LoRA adapter lifecycle platform — DAG pipelines,...

22
Experimental
52 mzuhair9933/PoPE-pytorch

⚙️ Implement polar coordinate positional embedding in PyTorch for efficient...

22
Experimental
53 Mogalina/transformer

Minimal Transformer implementation in pure C based on the architecture from...

22
Experimental
54 EricLBuehler/PerceiverIO-Classifier

A classifier based on PerceiverIO

21
Experimental
55 kyegomez/CT

Implementation of the attention and transformer from "Building Blocks for a...

21
Experimental
56 Rooooyy/HiTIN

Code for ACL 2023 paper "HiTIN: Hierarchy-aware Tree Isomorphism Network for...

20
Experimental
57 TiagoFilipeSousaGoncalves/survey-attention-medical-imaging

Implementation of the paper "A survey on attention mechanisms for medical...

20
Experimental
58 AlphafromZion/lora-lab

LoRA Training Config Generator — optimal configs for SDXL, FLUX,...

19
Experimental
59 BobMcDear/attention-in-vision

PyTorch implementation of popular attention mechanisms in vision

17
Experimental
60 MaitySubhajit/KArAt

Kolmogorov-Arnold Attention: Is Learnable Attention Better for Vision Transformers?

17
Experimental
61 hrbigelow/transformer-aiayn

The Transformer from "Attention is All You Need"

16
Experimental
62 btrojan-official/HypeLoRA

HypeLoRA: Hypernetwork-Generated LoRA Adapters for Calibrated Language Model...

16
Experimental
63 Iro96/Carbon

Carbon is a pure C++ Transformer framework inspired by GPT, featuring...

16
Experimental
64 ccfco/External-Attention-tensorflow

🍀 Tensorflow implementation of various Attention Mechanisms, MLP,...

16
Experimental
65 IBM/DEFT

Official pytorch code for "From PEFT to DEFT: Parameter Efficient Finetuning...

15
Experimental
66 sinpoce/ai-trainer-lite

🤖 3步训练你的专属AI模型 | 文本分类+图像分类+表格AutoML | Gradio可视化界面 | 无需GPU | 无需机器学习背景

15
Experimental
67 ross-sec/fractal_attention_analysis

A mathematical framework for analyzing transformer attention mechanisms...

15
Experimental
68 Nemesis-12/multihead-latent-attention

Implementation of Multi-head Latent Attention (MLA) from DeepSeek-V2

15
Experimental
69 cnygaard/FractalHTransformer

Fractal Hierarchical Transformer: multi-resolution causal attention patterns...

14
Experimental
70 ebrahimpichka/attn-PG-RL-tsp

A PyTorch implementation of the attention-based Policy Gradient RL for...

14
Experimental
71 ghosthamlet/transformers-rs

Rust Implemention of paper: Attention Is All You...

14
Experimental
72 externalPointerVariable/AttentionIsAllYouNeed

Implementing Transformers from Scratch

13
Experimental
73 biswajitsahoo1111/D2L_Attention_Mechanisms_in_TF

This repository contains Tensorflow 2 code for Attention Mechanisms chapter...

12
Experimental
74 SCCSMARTCODE/attention-is-all-you-need-from-scratch

A complete implementation of the Transformer architecture from scratch,...

11
Experimental
75 ducnt2406/AI-Headshot

Easy-to-use toolkit for training LoRA models with SimpleTuner, featuring a...

11
Experimental
76 romizone/simulasiLLM

🧠 Interactive LLM Attention Simulation — Visualize how GPT-2 transformers...

11
Experimental
77 adi-mish/miniformer

Miniformer is a lightweight PyTorch transformer library for researchers,...

11
Experimental
78 vijaysai1102/polyglot-neural-architecture

A multimodal deep learning project that integrates SQL, MongoDB, Graph, and...

11
Experimental
79 priyanshujiiii/awesome-Attention

Resources and references on solved and unsolved problems in attention mechanisms.

11
Experimental
80 nexus-4/self-attention-mechanism

Implementation of self-attention mechanism based on the "Attention is all...

11
Experimental
81 pointlander/bento

An aware attention free simplified image transformer

10
Experimental
82 TiagoFilipeSousaGoncalves/attention-mechanisms-healthcare

Implementation of the paper "Preliminary Study on the Impact of Attention...

10
Experimental
83 wanga90/halonet-pytorch

About Implementation of the 😇 Attention layer from the paper, Scaling Local...

10
Experimental
84 zhengqigao/hbsattn

a high-performance Block Sparse Attention kernel in Triton

10
Experimental