Transformer Architecture Tutorials Transformer Models

Educational implementations and hands-on learning resources covering transformer fundamentals, attention mechanisms, and core architecture components. Does NOT include domain-specific applications (math solving, embeddings, RL), research papers on transformer theory, or production-grade models.

There are 267 transformer architecture tutorials models tracked. 1 score above 70 (verified tier). The highest-rated is lucidrains/x-transformers at 79/100 with 5,808 stars. 1 of the top 10 are actively maintained.

Get all 267 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-tutorials&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising...

79
Verified
2 kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

63
Established
3 lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

62
Established
4 lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical...

58
Established
5 lucidrains/locoformer

LocoFormer - Generalist Locomotion via Long-Context Adaptation

53
Established
6 helpmefindaname/transformer-smaller-training-vocab

Temporary remove unused tokens during training to save ram and speed.

51
Established
7 kyegomez/attn_res

A clean, single-file PyTorch implementation of Attention Residuals (Kimi...

48
Emerging
8 allenai/smashed

SMASHED is a toolkit designed to apply transformations to samples in...

47
Emerging
9 kyegomez/zeta

Build high-performance AI models with modular building blocks

46
Emerging
10 Nicolepcx/Transformers-in-Action

This is the corresponding code for the book Transformers in Action

46
Emerging
11 tomaarsen/attention_sinks

Extend existing LLMs way beyond the original training length with constant...

46
Emerging
12 tensorops/TransformerX

Flexible Python library providing building blocks (layers) for reproducible...

44
Emerging
13 Rishit-dagli/Fast-Transformer

An implementation of Additive Attention

44
Emerging
14 Rishit-dagli/Perceiver

Implementation of Perceiver, General Perception with Iterative Attention

43
Emerging
15 gordicaleksa/pytorch-original-transformer

My implementation of the original transformer model (Vaswani et al.). I've...

43
Emerging
16 KRR-Oxford/HierarchyTransformers

Language Models as Hierarchy Encoders

43
Emerging
17 kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers:...

43
Emerging
18 Emmi-AI/noether

Deep-learning framework for Engineering AI. Built on transformer building...

42
Emerging
19 HUSTAI/uie_pytorch

PaddleNLP UIE模型的PyTorch版实现

42
Emerging
20 dell-research-harvard/linktransformer

A convenient way to link, deduplicate, aggregate and cluster data(frames) in...

42
Emerging
21 bhavsarpratik/easy-transformers

Utility functions to work with transformers

41
Emerging
22 kyegomez/HLT

Implementation of the transformer from the paper: "Real-World Humanoid...

41
Emerging
23 cedrickchee/awesome-transformer-nlp

A curated list of NLP resources focused on Transformer networks, attention...

40
Emerging
24 jiwidi/Behavior-Sequence-Transformer-Pytorch

This is a pytorch implementation for the BST model from Alibaba...

40
Emerging
25 The-AI-Summer/self-attention-cv

Implementation of various self-attention mechanisms focused on computer...

40
Emerging
26 0x7o/RETRO-transformer

Easy-to-use Retrieval-Enhanced Transformer implementation

40
Emerging
27 haoliuhl/ringattention

Large Context Attention

40
Emerging
28 Lightning-Universe/lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning

38
Emerging
29 AlignmentResearch/tuned-lens

Tools for understanding how transformer predictions are built layer-by-layer

38
Emerging
30 marella/ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

38
Emerging
31 eduard23144/locoformer

🤖 Explore LocoFormer, a Transformer-XL model that enhances robot locomotion...

37
Emerging
32 chengzeyi/ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model...

37
Emerging
33 bodeby/torchstack

🫧 probability-level model ensembling for transformers

37
Emerging
34 K-H-Ismail/torchortho

[ICLR 2026] Polynomial, trigonometric, and tropical activations

37
Emerging
35 sgrvinod/chess-transformers

Teaching transformers to play chess

37
Emerging
36 google-research/long-range-arena

Long Range Arena for Benchmarking Efficient Transformers

37
Emerging
37 lxuechen/private-transformers

A codebase that makes differentially private training of transformers easy.

36
Emerging
38 jonrbates/turing

A PyTorch library for simulating Turing machines with neural networks, based...

36
Emerging
39 Rishit-dagli/Conformer

An implementation of Conformer: Convolution-augmented Transformer for Speech...

35
Emerging
40 softmax1/Flash-Attention-Softmax-N

CUDA and Triton implementations of Flash Attention with SoftmaxN.

35
Emerging
41 Gurumurthy30/Stackformer

Modular PyTorch transformer library for building, training, and...

35
Emerging
42 Beomi/InfiniTransformer

Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No...

34
Emerging
43 IvanBongiorni/maximal

A TensorFlow-compatible Python library that provides models and layers to...

34
Emerging
44 ziplab/LIT

[AAAI 2022] This is the official PyTorch implementation of "Less is More:...

33
Emerging
45 dingo-actual/infini-transformer

PyTorch implementation of Infini-Transformer from "Leave No Context Behind:...

33
Emerging
46 kreasof-ai/OpenFormer

A hackable library for running and fine-tuning modern transformer models on...

33
Emerging
47 deep-div/Custom-Transformer-Pytorch

A clean, ground-up implementation of the Transformer architecture in...

33
Emerging
48 prajjwal1/fluence

A deep learning library based on Pytorch focussed on low resource language...

33
Emerging
49 neulab/knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling...

33
Emerging
50 knotgrass/attention

several types of attention modules written in PyTorch for learning purposes

32
Emerging
51 rafiepour/CTran

Complete code for the proposed CNN-Transformer model for natural language...

32
Emerging
52 Geotrend-research/smaller-transformers

Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

32
Emerging
53 cyk1337/Transformer-in-PyTorch

Transformer/Transformer-XL/R-Transformer examples and explanations

32
Emerging
54 clovaai/length-adaptive-transformer

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

32
Emerging
55 nihalsangeeth/behaviour-seq-transformer

Pytorch implementation of "Behaviour Sequence Transformer for E-commerce...

32
Emerging
56 naokishibuya/simple_transformer

A Transformer Implementation that is easy to understand and customizable.

32
Emerging
57 templetwo/PhaseGPT

Kuramoto Phase-Coupled Oscillator Attention in Transformers

32
Emerging
58 The-Swarm-Corporation/Hyena-Y

A PyTorch implementation of the Hyena-Y model, a convolution-based...

32
Emerging
59 cosbidev/NAIM

Official implementation for the paper ``Not Another Imputation Method: A...

31
Emerging
60 ccdv-ai/convert_checkpoint_to_lsg

Efficient Attention for Long Sequence Processing

31
Emerging
61 chef-transformer/chef-transformer

Chef Transformer 🍲 .

31
Emerging
62 Kirill-Kravtsov/drophead-pytorch

An implementation of drophead regularization for pytorch transformers

31
Emerging
63 iil-postech/semantic-attention

Official implementation of "Attention-aware semantic communications for...

30
Emerging
64 mohyunho/NAS_transformer

Evolutionary Neural Architecture Search on Transformers for RUL Prediction

30
Emerging
65 mhw32/prototransformer-public

PyTorch implementation for "ProtoTransformer: A Meta-Learning Approach to...

30
Emerging
66 alexeykarnachev/full_stack_transformer

Pytorch library for end-to-end transformer models training, inference and serving

30
Emerging
67 warner-benjamin/commented-transformers

Highly commented implementations of Transformers in PyTorch

29
Experimental
68 frankaging/ReCOGS

ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of...

29
Experimental
69 saeeddhqan/tiny-transformer

Tiny transformer models implemented in pytorch.

29
Experimental
70 antonyvigouret/Pay-Attention-to-MLPs

My implementation of the gMLP model from the paper "Pay Attention to MLPs".

29
Experimental
71 Selozhd/FNet-tensorflow

Tensorflow Implementation of "FNet: Mixing Tokens with Fourier Transforms."

29
Experimental
72 Baran-phys/Tropical-Attention

[NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic...

29
Experimental
73 jaketae/alibi

PyTorch implementation of Train Short, Test Long: Attention with Linear...

29
Experimental
74 maxxxzdn/erwin

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical...

28
Experimental
75 fattorib/fusedswiglu

Fused SwiGLU Triton kernels

28
Experimental
76 arshadshk/SAINT-pytorch

SAINT PyTorch implementation

28
Experimental
77 tgautam03/Transformers

A Gentle Introduction to Transformers Neural Network

28
Experimental
78 c00k1ez/plain-transformers

Transformer models implementation for training from scratch.

27
Experimental
79 AkiRusProd/numpy-transformer

A numpy implementation of the Transformer model in "Attention is All You Need"

27
Experimental
80 BubbleJoe-BrownU/TransformerHub

This is a repository of transformer-like models, including Transformer, GPT,...

27
Experimental
81 SakanaAI/evo-memory

Code to train and evaluate Neural Attention Memory Models to obtain...

27
Experimental
82 Agora-Lab-AI/HydraNet

HydraNet is a state-of-the-art transformer architecture that combines...

27
Experimental
83 will-thompson-k/tldr-transformers

The "tl;dr" on a few notable transformer papers (pre-2022).

27
Experimental
84 iKernels/transformers-lightning

A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses...

27
Experimental
85 kyegomez/Open-NAMM

An open source implementation of the paper: "AN EVOLVED UNIVERSAL TRANSFORMER MEMORY"

26
Experimental
86 Kareem404/hyper-connections

A minimal implementation of Manifold-Constrained Hyper-Connections (mHC)...

26
Experimental
87 kyegomez/Open-Olmo

Unofficial open-source PyTorch implementation of the OLMo Hybrid...

26
Experimental
88 telekom/transformer-tools

Transformers Training Tools

26
Experimental
89 mcbal/deep-implicit-attention

Implementation of deep implicit attention in PyTorch

26
Experimental
90 hasanisaeed/C-Transformer

Implementation of the core Transformer architecture in pure C

26
Experimental
91 ArneBinder/pytorch-ie-hydra-template-1

PyTorch-IE Hydra Template

25
Experimental
92 arshadshk/Last_Query_Transformer_RNN-PyTorch

Implementation of the paper "Last Query Transformer RNN for knowledge...

25
Experimental
93 fualsan/TransformerFromScratch

PyTorch Implementation of Transformer Deep Learning Model

25
Experimental
94 RJain12/choformer

Cho codon optimization WIP

25
Experimental
95 FareedKhan-dev/Understanding-Transformers-Step-by-Step-math-example

Understanding Large Language Transformer Architecture like a child

25
Experimental
96 codyjk/ChessGPT

♟️ A transformer that plays chess 🤖

25
Experimental
97 MurtyShikhar/TreeProjections

Tool to measure tree-structuredness of the internal algorithm learnt by a...

25
Experimental
98 chris-santiago/met

Reproducing the MET framework with PyTorch

25
Experimental
99 xdevfaheem/Transformers

A Comprehensive Implementation of Transformers Architecture from Scratch

25
Experimental
100 crscardellino/argumentation-mining-transformers

Argumentation Mining Transformers Module (AMTM) implementation.

24
Experimental
101 NiuTrans/Introduction-to-Transformers

An introduction to basic concepts of Transformers and key techniques of...

24
Experimental
102 mtanghu/LEAP

LEAP: Linear Explainable Attention in Parallel for causal language modeling...

24
Experimental
103 nullHawk/simple-transformer

Implementation of Transformer model in PyTorch

24
Experimental
104 KhaledSharif/robot-transformers

Train and evaluate an Action Chunking Transformer (ACT) to perform...

24
Experimental
105 ziansu/codeart

Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention...

24
Experimental
106 vmarinowski/infini-attention

An unofficial pytorch implementation of 'Efficient Infinite Context...

24
Experimental
107 garyb9/pytorch-transformers

Transformers architecture code playground repository in python using PyTorch.

24
Experimental
108 mfekadu/nimbus-transformer

it's like Nimbus but uses a transformer language model

23
Experimental
109 Uokoroafor/transformer_from_scratch

This is a PyTorch implementation of the Transformer model in the paper...

23
Experimental
110 rishabkr/Attention-Is-All-You-Need-Explained-PyTorch

A paper implementation and tutorial from scratch combining various great...

23
Experimental
111 davide-coccomini/TimeSformer-Video-Classification

The notebook explains the various steps to obtain the results of...

23
Experimental
112 jaketae/tupe

PyTorch implementation of Rethinking Positional Encoding in Language Pre-training

23
Experimental
113 gmontamat/poor-mans-transformers

Implement Transformers (and Deep Learning) from scratch in NumPy

23
Experimental
114 bfilar/URLTran

PyTorch/HuggingFace Implementation of URLTran: Improving Phishing URL...

23
Experimental
115 trialandsuccess/verysimpletransformers

Very Simple Transformers provides a simplified interface for packaging,...

23
Experimental
116 mingikang31/Convolutional-Nearest-Neighbor-Attention

Convolutional Nearest Neighbor Attention for Transformers

22
Experimental
117 simboco/flash-linear-attention

💥 Optimize linear attention models with efficient Triton-based...

22
Experimental
118 Gala2044/Transformers-for-absolute-dummies

🚀 Master transformers with this simple guide that breaks down complex...

22
Experimental
119 kazuki-irie/kv-memory-brain

Official Code Repository for the paper "Key-value memory in the brain"

22
Experimental
120 allenai/staged-training

Staged Training for Transformer Language Models

22
Experimental
121 NTT123/sketch-transformer

Modeling Draw, Quick! dataset using transformers

22
Experimental
122 pelagecha/typ

Associative Memory Augmentation for Long-Context Retrieval in Transformers

22
Experimental
123 teddykoker/grokking

PyTorch implementation of "Grokking: Generalization Beyond Overfitting on...

22
Experimental
124 antofuller/configaformers

A python library for highly configurable transformers - easing model...

22
Experimental
125 mcbal/spin-model-transformers

Physics-inspired transformer modules based on mean-field dynamics of...

22
Experimental
126 dpressel/mint

MinT: Minimal Transformer Library and Tutorials

22
Experimental
127 rahul13ramesh/compositional_capabilities

Compositional Capabilities of Autoregressive Transformers: A Study on...

21
Experimental
128 osiriszjq/impulse_init

Convolutional Initialization for Data-Efficient Vision Transformers

21
Experimental
129 somosnlp/the-annotated-transformer

Traducción al español del notebook "The Annotated Transformer" de Harvard...

20
Experimental
130 erfanzar/OST-OpenSourceTransformers

OST Collection: An AI-powered suite of models that predict the next word...

20
Experimental
131 milistu/outformer

Clean Outputs from Language Models

20
Experimental
132 ArtificialZeng/transformers-Explained

官方transformers源码解析。AI大模型时代,pytorch、transformer是新操作系统,其他都是运行在其上面的软件。

20
Experimental
133 declare-lab/KNOT

This repository contains the implementation of the paper -- KNOT: Knowledge...

20
Experimental
134 hmohebbi/ValueZeroing

The official repo for the EACL 2023 paper "Quantifying Context Mixing in...

20
Experimental
135 dunktra/attention-binding-a11y

Code for tracking concept emergence via attention-head binding (EB*). Pythia...

20
Experimental
136 ArpitKadam/Attention-Is-All-You-Code

From Attention Mechanisms to Large Language Models — built from scratch.

20
Experimental
137 hereandnowai/transformers-simplified

Simplified, standalone Python scripts for transformer models, LLMs, TTS,...

20
Experimental
138 Brokttv/Transformer-from-scratch

elaborate transformer implementation + detailed explanation

19
Experimental
139 ays-dev/keras-transformer

Encoder-Decoder Transformer with cross-attention

19
Experimental
140 hrithickcodes/transformer-tf

This repository contains the code for the paper "Attention Is All You Need"...

19
Experimental
141 mingikang31/Fully-Convolutional-Transformers

FCT: Fully Convolutional Transformers

19
Experimental
142 KeepALifeUS/ml-attention-mechanisms

Flash Attention, RoPE, multi-head attention for temporal patterns

19
Experimental
143 Cobkgukgg/forgenn

Modern neural networks in pure NumPy - Transformers, ResNet, and more

19
Experimental
144 mtingers/kompoz

kompoz: Composable predicate and transform combinators with operator overloading

19
Experimental
145 marcolacagnina/transformer-for-code-analysis

PyTorch implementation of a Transformer Encoder to predict the Big O time...

19
Experimental
146 gheb02/chess-transformer

This repository implements a KV Cache mechanism in autoregressive...

19
Experimental
147 Johnpaul10j/Transformers-with-keras

Used the keras library to build a transformer using a sequence to sequence...

19
Experimental
148 jdmogollonp/tips-dpt-decoder

Implementation of DeepMind TIPS DPT Decoder

19
Experimental
149 Abhinand20/MathFormer

MathFormer - Solve math equations using NLP and transformers!

18
Experimental
150 osiriszjq/structured_init

Structured Initialization for Attention in Vision Transformers

18
Experimental
151 ansh-info/Titans-Learning-to-Memorize-at-Test-Time-with-Manim

Visual animated walkthroughs of the DeepMind "Titans: Learning to Memorize...

18
Experimental
152 Bradley-Butcher/Conformers

Unofficial implementation of Conformal Language Modeling by Quach et al

17
Experimental
153 princeton-nlp/dyck-transformer

[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages

17
Experimental
154 shreydan/scratchformers

building various transformer model architectures and its modules from scratch.

17
Experimental
155 afspies/attention-tutorial

Jupyter Notebook tutorial on Attention Mechanisms, Position Embeddings and...

17
Experimental
156 danadascalescu00/ioai-transformer-workshop

A hands-on introduction to Transformer architecture, designed for...

17
Experimental
157 0xOpenBytes/c

📦 Micro Composition using Transformations and Cache

17
Experimental
158 AMDonati/SMC-T-v2

Code for the paper "The Monte Carlo Transformer: a stochastic self-attention...

16
Experimental
159 shubhexists/transformers

basic implementation of transformers

16
Experimental
160 tech-srl/layer_norm_expressivity_role

Code for the paper "On the Expressivity Role of LayerNorm in Transformers'...

16
Experimental
161 Anne-Andresen/Multi-Modal-cuda-C-GAN

Raw C/cuda implementation of 3d GAN

16
Experimental
162 harrisonvshen/triton-accelerated-attention

Custom Triton GPU kernels for multi-head attention, including QK^T, softmax,...

16
Experimental
163 KOKOSde/sparse-clt

Cross-Layer Transcoder (CLT) library for extracting sparse interpretable...

16
Experimental
164 frikishaan/pytorch-transformers

This repository contains the original transformers model implementation code.

16
Experimental
165 NeuralCoder3/custom_infinite_craft

A custom implementation of Infinite Craft (https://neal.fun/infinite-craft/)

16
Experimental
166 BoCtrl-C/attention-rollout

Unofficial PyTorch implementation of Attention Rollout

15
Experimental
167 mcbal/afem

Implementation of approximate free-energy minimization in PyTorch

15
Experimental
168 hazdzz/converter

The official PyTorch implementation of Converter.

15
Experimental
169 homerjed/transformer_flows

Implementation of Apple ML's Transformer Flow (or TARFlow) from "Normalising...

15
Experimental
170 parham1998/Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Official Pytorch Implementation of: "Enhancing High-Vocabulary Image...

15
Experimental
171 shilongdai/ROT5

Small transformer trained from scratch

15
Experimental
172 thiomajid/distil_xlstm

Learning Attention Mechanisms through Recurrent Structures

15
Experimental
173 Jayluci4/micro-attention

Attention mechanism in ~50 lines - understand transformers by building from scratch

15
Experimental
174 Prakhar-Bhartiya/Transformers_From_Scratch

A walkthrough that builds a Transformer from first principles inside Jupyter...

15
Experimental
175 ArshockAbedan/Natural-Language-Processing-with-Attention-Models

Attention Models in NLP

15
Experimental
176 KOKOSde/sparse-transcoder

PyPI package for optimized sparse feature extraction from transformer...

15
Experimental
177 pavlosdais/Transformers-Linear-Algebra

Transformer Based Learning of Fundamental Linear Algebra Operations

15
Experimental
178 Mozeel-V/nebula-mini

Minimal PyTorch-based Nebula pipeline replica for malware behavior modeling

15
Experimental
179 tom-effernelli/small-LLM

Implementing the 'Attention is all you need' paper through a simple LLM model

15
Experimental
180 CESOIA/transformer-surgeon

Transformer models library with compression options

15
Experimental
181 dlukeh/transformer-deep-dive

A deep descent into the neural abyss — understanding transformers through...

14
Experimental
182 MrHenstep/NN_Self_Learn

Neural network architectures from perceptrons to GPT, built and trained from scratch

14
Experimental
183 abc1203/transformer-model

An implementation of the transformer deep learning model, based on the...

14
Experimental
184 ozyurtf/attention-and-transformers

The purpose of this project is to understand how the Transformers work and...

14
Experimental
185 macespinoza/mini-transformer-didactico

Implementación didáctica de un Transformer Encoder–Decoder basada en...

14
Experimental
186 M-e-r-c-u-r-y/pytorch-transformers

Collection of different types of transformers for learning purposes

14
Experimental
187 pranoyr/attention-models

Simplified Implementation of SOTA Deep Learning Papers in Pytorch

14
Experimental
188 bikhanal/transformers

The implementation of transformer as presented in the paper "Attention is...

14
Experimental
189 ghubnerr/attention-mechanisms

A compilation of most State-of-the-Art Attention Mechanisms: MHSA, MQA, GQA,...

14
Experimental
190 kyegomez/AttnWithConvolutions

Interleaved Attention's with convolutions for text modeling

13
Experimental
191 kyegomez/GATS

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in...

13
Experimental
192 mawright/pytorch-sparse-utils

Low-level utilities for Pytorch sparse tensors and operations

13
Experimental
193 gmongaras/Cottention_Transformer

Code for the paper "Cottention: Linear Transformers With Cosine Attention"

13
Experimental
194 Vadimbuildercxx/looped_transformer

Experimental implementation of "Looped Transformers are Better at Learning...

13
Experimental
195 rajveer43/titan_transformer

Unofficial implementation of titans transformer

13
Experimental
196 Lucasc-99/NoTorch

A from-scratch neural network and transformers library, with speeds rivaling PyTorch

13
Experimental
197 snoop2head/Deep-Encoder-Shallow-Decoder

🤗 Huggingface Implementation of Kasai et al(2020) "Deep Encoder, Shallow...

13
Experimental
198 NathanLeroux-git/OnlineTransformerWithSpikingNeurons

This code is the implementation of the Spiking Online Transformer of the...

13
Experimental
199 HySonLab/HierAttention

Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range...

13
Experimental
200 kyegomez/Mixture-of-MQA

An implementation of a switch transformer like Multi-query attention model

13
Experimental
201 yulang/phrasal-composition-in-transformers

This repo contains datasets and code for Assessing Phrasal Representation...

13
Experimental
202 SyedAkramaIrshad/transformer-grokking-lab

Tiny Transformer grokking experiment with live notebook visualizations.

13
Experimental
203 PeterJemley/Continuous-Depth-Transformers-with-Learned-Control-Dynamics

Hybrid transformer architecture replacing discrete layers with Neural ODE...

13
Experimental
204 tzhengtek/saute

SAUTE is a lightweight transformer-based architecture adapted for dialog modeling

13
Experimental
205 zzmtsvv/ad-gta

Grouped-Tied Attention by Zadouri, Strauss, Dao (2025).

13
Experimental
206 Omikrone/Mnemos

Mnemos is a mini-LLM based on Transformers, designed for training and...

13
Experimental
207 Carnetemperrado/x-transformers-rl

x-transformers-rl is a work-in-progress implementation of a transformer for...

12
Experimental
208 VinkuraAI/AXEN-M

AXEN-M (Attention eXtended Efficient Network - Model) is a powerful...

12
Experimental
209 awadalaa/transact

An unofficial implementation of "TransAct: Transformer-based Realtime User...

12
Experimental
210 moskomule/simple_transformers

Simple transformer implementations that I can understand

12
Experimental
211 SergioArnaud/attention-is-all-you-need

Implementation of a transformer following the Attention Is All You Need paper

12
Experimental
212 lorenzobalzani/nlp-dl-experiments

Python implementation of Deep Learning models, with a focus on NLP.

12
Experimental
213 agasheaditya/handson-transformers

End-to-end implementation of Transformers using PyTorch from scratch

12
Experimental
214 kyegomez/MultiQuerySuperpositionAttention

Multi-Query Attention with Sub-linear Masking, Superposition, and Entanglement

12
Experimental
215 Sarhamam/ZetaFormer

Curriculum learning framework that uses geometrically structured datasets...

12
Experimental
216 viktor-shcherb/qk-pca-analysis

PCA analysis of Q/K attention vectors to discover position-correlated...

12
Experimental
217 R2D2-08/turmachpy

A python package for simulating a variety of Turing machines.

12
Experimental
218 Sid7on1/Transformer-256dim

A powerful Transformer architecture built from scratch by Prajwal for...

12
Experimental
219 DzmitryPihulski/Encoder-transformer-from-scratch

Fully functional encoder transformer from tokenizer to lm-head

12
Experimental
220 tegridydev/hydraform

Self-Evolving Python Transformer Research

12
Experimental
221 hunterhammond-dev/attention-mechanisms-in-transformers

Learn and visualize attention mechanisms in transformer models — inspired by...

12
Experimental
222 viktor-shcherb/qk-sniffer

Capture sampled Q/K attention vectors from HF transformers into per-branch...

12
Experimental
223 pedrocurvo/HAET

HAET: Hierarchical Attention Erwin Transolver is a hybrid neural...

12
Experimental
224 NLP-Project-PoliMi-2025/NLP-Project

Can chess be tackled using NLP techniques? "Natural Language Processing"...

12
Experimental
225 arvind207kumar/Time-Cross-Adaptive-Self-Attention-TCSA-based-Imputation-model-

Time-Cross Adaptive Self-Attention (TCSA) model for multivariate Time...

12
Experimental
226 kanenorman/grassmann

Attempt at reproducing "Attention Is Not What You Need: Grassmann Flows as...

11
Experimental
227 richengguy/calc.ai

Transformer-based Calculator

11
Experimental
228 sarabesh/exploring-transformers

A typical repo, to contain code I am doing to learn transformers...

11
Experimental
229 Chamiln17/Transformer-From-Scratch

My implmentation of the transformer architecture described in the paper...

11
Experimental
230 rashi-bhansali/encoder-decoder-transformer-variants-from-scratch

PyTorch implementation of Transformer encoder and GPT-style decoder with...

11
Experimental
231 chaowei312/HyperGraph-Sparse-Attention

Sparse attention via hypergraph partitioning for efficient long-context transformers

11
Experimental
232 wildanjr19/transformers-from-scratch

Implementing Transformers from Attention is All You Need paper in scratch.

11
Experimental
233 sathishkumar67/Byte-Latent-Transformer

Implementation of Byte Latent Transformer

11
Experimental
234 benearnthof/SparseTransformers

Reproducing the Paper Generating Long Sequences with Sparse Transformers by...

11
Experimental
235 adityakamat24/triton-fast-mha

A high-performance kernel implementation of multi-head attention using...

11
Experimental
236 albertkjoller/transformer-redundancy

Code for the paper "How Redundant Is the Transformer Stack in Speech...

11
Experimental
237 isakovaad/fedcsis25

A machine learning project to predict chess puzzle difficulty ratings using...

11
Experimental
238 AnkitaMungalpara/Building-DeepSeek-From-Scratch

This repository shows how to build a DeepSeek language model from scratch...

11
Experimental
239 balamarimuthu/deep-learning-with-pytorch

This repository contains a minimal PyTorch-based Transformer model...

11
Experimental
240 Joe-Naz01/transformers

A deep learning project that implements and explains the fundamental...

11
Experimental
241 Projects-Developer/Transformer-Models-For-NLP-Applications

Includes Source Code, PPT, Synopsis, Report, Documents, Base Research Paper...

11
Experimental
242 samaraxmmar/transformer-explained

A hands-on guide to understanding and building Transformer models from...

11
Experimental
243 graphcore-research/flash-attention-ipu

Poplar implementation of FlashAttention for IPU

11
Experimental
244 gustavecortal/transformer

Slides from my NLP course on the transformer architecture

11
Experimental
245 kikirizki/transformer

Minimalistic PyTorch implementation of transformer

11
Experimental
246 BramVanroy/lt3-2019-transformer-trainer

Transformer trainer for variety of classification problems that has been...

11
Experimental
247 lmxx1234567/goofy-hydra

Goofy Hydra is a Transport Layer Link Aggregator based on Transformer

11
Experimental
248 ytgui/SPT-proto

This repo includes a Sparse Transformer implementation which utilizes PQ to...

11
Experimental
249 Dhyanam04/ByteFetcher

This is ByteFetcher

11
Experimental
250 dariush-bahrami/mytransformers

My implementation of transformers

11
Experimental
251 ander-db/Transformers-PytorchLightning

👋 This is my implementation of the Transformer architecture from scratch...

11
Experimental
252 Jourdelune/Transformer

My implementation of the transformer architecture from the paper "Attention...

11
Experimental
253 maxime7770/Transformers-Insights

Exploring how Transformers actually transform the data under the hood

11
Experimental
254 ariva00/GaussianAttention4Matching

Code for the models described in the paper Localized Gaussians as...

11
Experimental
255 kyegomez/open-text-embedding-ada-002

This repository presents a production-grade implementation of a...

11
Experimental
256 Ranjit2111/Transformer-NMT

A PyTorch implementation of the Transformer architecture from "Attention Is...

11
Experimental
257 AlperYildirim1/Attention-is-All-You-Need-Pytorch

A fully reproducible, high-performance PyTorch Colab implementation of the...

10
Experimental
258 hash-ir/transformer-lab

Hands-on implementation of transformer and related models

10
Experimental
259 pplkit/AllYouNeedIsAttention

An efficient and robust implementation of the seminal "Attention Is All You...

10
Experimental
260 fatou1526/Pytorch_Transformers

This repo contains codes concerning pytorch models from how to define the...

10
Experimental
261 girishdhegde/NLP

Implementation of Deep Learning based Language Models from scratch in PyTorch

10
Experimental
262 microcoder-py/attn-is-all-you-need

A TFX implementation of the paper on transformers, Attention is All You Need

10
Experimental
263 shahrukhx01/transformers-bisected

A repo containing all building blocks of transformer model for text...

10
Experimental
264 Ipvikukiepki-KQS/progressive-transformers

A neural network architecture for building conversational agents

10
Experimental
265 devrahulbanjara/Transformers-from-Scratch

A repository implementing Transformers from scratch using PyTorch, designed...

10
Experimental
266 JHansiduYapa/Transformer-Model-from-Scratch

Build a Transformer model from scratch using Pytorch, implementing key...

10
Experimental
267 NipunRathore/NLP-Transformers-from-Scratch

Pre-training a Transformer from scratch.

10
Experimental