Transformer Architecture Tutorials Transformer Models

Educational implementations and hands-on learning resources covering transformer fundamentals, attention mechanisms, and core architecture components. Does NOT include domain-specific applications (math solving, embeddings, RL), research papers on transformer theory, or production-grade models.

There are 267 transformer architecture tutorials models tracked. 1 score above 70 (verified tier). The highest-rated is lucidrains/x-transformers at 79/100 with 5,808 stars. 1 of the top 10 are actively maintained.

Get all 267 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-tutorials&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	lucidrains/x-transformers A concise but complete full-attention transformer with a set of promising...	79	Verified	5,808	Python
2	kanishkamisra/minicons Utility for behavioral and representational analyses of Language Models	63	Established	183	Python
3	lucidrains/dreamer4 Implementation of Danijar's latest iteration for his Dreamer line of work	62	Established	165	Python
4	lucidrains/simple-hierarchical-transformer Experiments around a simple idea for inducing multiple hierarchical...	58	Established	225	Python
5	lucidrains/locoformer LocoFormer - Generalist Locomotion via Long-Context Adaptation	53	Established	102	Python
6	helpmefindaname/transformer-smaller-training-vocab Temporary remove unused tokens during training to save ram and speed.	51	Established	23	Python
7	kyegomez/attn_res A clean, single-file PyTorch implementation of Attention Residuals (Kimi...	48	Emerging	8	Python
8	allenai/smashed SMASHED is a toolkit designed to apply transformations to samples in...	47	Emerging	35	Python
9	kyegomez/zeta Build high-performance AI models with modular building blocks	46	Emerging	579	Python
10	Nicolepcx/Transformers-in-Action This is the corresponding code for the book Transformers in Action	46	Emerging	135	Jupyter Notebook
11	tomaarsen/attention_sinks Extend existing LLMs way beyond the original training length with constant...	46	Emerging	736	Python
12	tensorops/TransformerX Flexible Python library providing building blocks (layers) for reproducible...	44	Emerging	53	Python
13	Rishit-dagli/Fast-Transformer An implementation of Additive Attention	44	Emerging	148	Jupyter Notebook
14	Rishit-dagli/Perceiver Implementation of Perceiver, General Perception with Iterative Attention	43	Emerging	87	Python
15	gordicaleksa/pytorch-original-transformer My implementation of the original transformer model (Vaswani et al.). I've...	43	Emerging	1,085	Jupyter Notebook
16	KRR-Oxford/HierarchyTransformers Language Models as Hierarchy Encoders	43	Emerging	40	Python
17	kyegomez/SwitchTransformers Implementation of Switch Transformers from the paper: "Switch Transformers:...	43	Emerging	136	Python
18	Emmi-AI/noether Deep-learning framework for Engineering AI. Built on transformer building...	42	Emerging	131	Python
19	HUSTAI/uie_pytorch PaddleNLP UIE模型的PyTorch版实现	42	Emerging	683	Python
20	dell-research-harvard/linktransformer A convenient way to link, deduplicate, aggregate and cluster data(frames) in...	42	Emerging	135	Python
21	bhavsarpratik/easy-transformers Utility functions to work with transformers	41	Emerging	10	Python
22	kyegomez/HLT Implementation of the transformer from the paper: "Real-World Humanoid...	41	Emerging	62	Python
23	cedrickchee/awesome-transformer-nlp A curated list of NLP resources focused on Transformer networks, attention...	40	Emerging	1,131	—
24	jiwidi/Behavior-Sequence-Transformer-Pytorch This is a pytorch implementation for the BST model from Alibaba...	40	Emerging	176	Jupyter Notebook
25	The-AI-Summer/self-attention-cv Implementation of various self-attention mechanisms focused on computer...	40	Emerging	1,215	Python
26	0x7o/RETRO-transformer Easy-to-use Retrieval-Enhanced Transformer implementation	40	Emerging	10	Python
27	haoliuhl/ringattention Large Context Attention	40	Emerging	770	Python
28	Lightning-Universe/lightning-transformers Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning	38	Emerging	612	Python
29	AlignmentResearch/tuned-lens Tools for understanding how transformer predictions are built layer-by-layer	38	Emerging	574	Python
30	marella/ctransformers Python bindings for the Transformer models implemented in C/C++ using GGML library.	38	Emerging	1,882	C
31	eduard23144/locoformer 🤖 Explore LocoFormer, a Transformer-XL model that enhances robot locomotion...	37	Emerging	4	Python
32	chengzeyi/ParaAttention https://wavespeed.ai/ Context parallel attention that accelerates DiT model...	37	Emerging	425	Python
33	bodeby/torchstack 🫧 probability-level model ensembling for transformers	37	Emerging	3	Python
34	K-H-Ismail/torchortho [ICLR 2026] Polynomial, trigonometric, and tropical activations	37	Emerging	16	Jupyter Notebook
35	sgrvinod/chess-transformers Teaching transformers to play chess	37	Emerging	151	Python
36	google-research/long-range-arena Long Range Arena for Benchmarking Efficient Transformers	37	Emerging	783	Python
37	lxuechen/private-transformers A codebase that makes differentially private training of transformers easy.	36	Emerging	185	Python
38	jonrbates/turing A PyTorch library for simulating Turing machines with neural networks, based...	36	Emerging	2	Python
39	Rishit-dagli/Conformer An implementation of Conformer: Convolution-augmented Transformer for Speech...	35	Emerging	45	Python
40	softmax1/Flash-Attention-Softmax-N CUDA and Triton implementations of Flash Attention with SoftmaxN.	35	Emerging	73	Python
41	Gurumurthy30/Stackformer Modular PyTorch transformer library for building, training, and...	35	Emerging	7	Python
42	Beomi/InfiniTransformer Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No...	34	Emerging	375	Python
43	IvanBongiorni/maximal A TensorFlow-compatible Python library that provides models and layers to...	34	Emerging	9	Python
44	ziplab/LIT [AAAI 2022] This is the official PyTorch implementation of "Less is More:...	33	Emerging	97	Python
45	dingo-actual/infini-transformer PyTorch implementation of Infini-Transformer from "Leave No Context Behind:...	33	Emerging	298	Python
46	kreasof-ai/OpenFormer A hackable library for running and fine-tuning modern transformer models on...	33	Emerging	28	Python
47	deep-div/Custom-Transformer-Pytorch A clean, ground-up implementation of the Transformer architecture in...	33	Emerging	16	Jupyter Notebook
48	prajjwal1/fluence A deep learning library based on Pytorch focussed on low resource language...	33	Emerging	70	Python
49	neulab/knn-transformers PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling...	33	Emerging	286	Python
50	knotgrass/attention several types of attention modules written in PyTorch for learning purposes	32	Emerging	53	Python
51	rafiepour/CTran Complete code for the proposed CNN-Transformer model for natural language...	32	Emerging	30	Jupyter Notebook
52	Geotrend-research/smaller-transformers Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.	32	Emerging	105	Jupyter Notebook
53	cyk1337/Transformer-in-PyTorch Transformer/Transformer-XL/R-Transformer examples and explanations	32	Emerging	26	Python
54	clovaai/length-adaptive-transformer Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)	32	Emerging	102	Python
55	nihalsangeeth/behaviour-seq-transformer Pytorch implementation of "Behaviour Sequence Transformer for E-commerce...	32	Emerging	23	Python
56	naokishibuya/simple_transformer A Transformer Implementation that is easy to understand and customizable.	32	Emerging	11	Python
57	templetwo/PhaseGPT Kuramoto Phase-Coupled Oscillator Attention in Transformers	32	Emerging	9	Python
58	The-Swarm-Corporation/Hyena-Y A PyTorch implementation of the Hyena-Y model, a convolution-based...	32	Emerging	11	Python
59	cosbidev/NAIM Official implementation for the paper ``Not Another Imputation Method: A...	31	Emerging	11	Python
60	ccdv-ai/convert_checkpoint_to_lsg Efficient Attention for Long Sequence Processing	31	Emerging	98	Python
61	chef-transformer/chef-transformer Chef Transformer 🍲 .	31	Emerging	85	Python
62	Kirill-Kravtsov/drophead-pytorch An implementation of drophead regularization for pytorch transformers	31	Emerging	19	Python
63	iil-postech/semantic-attention Official implementation of "Attention-aware semantic communications for...	30	Emerging	13	Jupyter Notebook
64	mohyunho/NAS_transformer Evolutionary Neural Architecture Search on Transformers for RUL Prediction	30	Emerging	50	Python
65	mhw32/prototransformer-public PyTorch implementation for "ProtoTransformer: A Meta-Learning Approach to...	30	Emerging	16	Python
66	alexeykarnachev/full_stack_transformer Pytorch library for end-to-end transformer models training, inference and serving	30	Emerging	70	Python
67	warner-benjamin/commented-transformers Highly commented implementations of Transformers in PyTorch	29	Experimental	138	Python
68	frankaging/ReCOGS ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of...	29	Experimental	10	Jupyter Notebook
69	saeeddhqan/tiny-transformer Tiny transformer models implemented in pytorch.	29	Experimental	9	Python
70	antonyvigouret/Pay-Attention-to-MLPs My implementation of the gMLP model from the paper "Pay Attention to MLPs".	29	Experimental	25	Python
71	Selozhd/FNet-tensorflow Tensorflow Implementation of "FNet: Mixing Tokens with Fourier Transforms."	29	Experimental	22	Python
72	Baran-phys/Tropical-Attention [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic...	29	Experimental	27	Python
73	jaketae/alibi PyTorch implementation of Train Short, Test Long: Attention with Linear...	29	Experimental	33	Python
74	maxxxzdn/erwin Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical...	28	Experimental	112	Python
75	fattorib/fusedswiglu Fused SwiGLU Triton kernels	28	Experimental	12	Python
76	arshadshk/SAINT-pytorch SAINT PyTorch implementation	28	Experimental	92	Python
77	tgautam03/Transformers A Gentle Introduction to Transformers Neural Network	28	Experimental	14	Jupyter Notebook
78	c00k1ez/plain-transformers Transformer models implementation for training from scratch.	27	Experimental	9	Python
79	AkiRusProd/numpy-transformer A numpy implementation of the Transformer model in "Attention is All You Need"	27	Experimental	58	Python
80	BubbleJoe-BrownU/TransformerHub This is a repository of transformer-like models, including Transformer, GPT,...	27	Experimental	87	Python
81	SakanaAI/evo-memory Code to train and evaluate Neural Attention Memory Models to obtain...	27	Experimental	352	Python
82	Agora-Lab-AI/HydraNet HydraNet is a state-of-the-art transformer architecture that combines...	27	Experimental	9	Shell
83	will-thompson-k/tldr-transformers The "tl;dr" on a few notable transformer papers (pre-2022).	27	Experimental	189	—
84	iKernels/transformers-lightning A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses...	27	Experimental	47	Python
85	kyegomez/Open-NAMM An open source implementation of the paper: "AN EVOLVED UNIVERSAL TRANSFORMER MEMORY"	26	Experimental	6	Python
86	Kareem404/hyper-connections A minimal implementation of Manifold-Constrained Hyper-Connections (mHC)...	26	Experimental	6	Python
87	kyegomez/Open-Olmo Unofficial open-source PyTorch implementation of the OLMo Hybrid...	26	Experimental	6	Python
88	telekom/transformer-tools Transformers Training Tools	26	Experimental	6	Python
89	mcbal/deep-implicit-attention Implementation of deep implicit attention in PyTorch	26	Experimental	65	Python
90	hasanisaeed/C-Transformer Implementation of the core Transformer architecture in pure C	26	Experimental	8	C
91	ArneBinder/pytorch-ie-hydra-template-1 PyTorch-IE Hydra Template	25	Experimental	9	Python
92	arshadshk/Last_Query_Transformer_RNN-PyTorch Implementation of the paper "Last Query Transformer RNN for knowledge...	25	Experimental	44	Python
93	fualsan/TransformerFromScratch PyTorch Implementation of Transformer Deep Learning Model	25	Experimental	2	Jupyter Notebook
94	RJain12/choformer Cho codon optimization WIP	25	Experimental	13	Jupyter Notebook
95	FareedKhan-dev/Understanding-Transformers-Step-by-Step-math-example Understanding Large Language Transformer Architecture like a child	25	Experimental	28	—
96	codyjk/ChessGPT ♟️ A transformer that plays chess 🤖	25	Experimental	6	Python
97	MurtyShikhar/TreeProjections Tool to measure tree-structuredness of the internal algorithm learnt by a...	25	Experimental	12	Python
98	chris-santiago/met Reproducing the MET framework with PyTorch	25	Experimental	5	Python
99	xdevfaheem/Transformers A Comprehensive Implementation of Transformers Architecture from Scratch	25	Experimental	4	Python
100	crscardellino/argumentation-mining-transformers Argumentation Mining Transformers Module (AMTM) implementation.	24	Experimental	2	Python
101	NiuTrans/Introduction-to-Transformers An introduction to basic concepts of Transformers and key techniques of...	24	Experimental	51	—
102	mtanghu/LEAP LEAP: Linear Explainable Attention in Parallel for causal language modeling...	24	Experimental	4	Jupyter Notebook
103	nullHawk/simple-transformer Implementation of Transformer model in PyTorch	24	Experimental	4	Python
104	KhaledSharif/robot-transformers Train and evaluate an Action Chunking Transformer (ACT) to perform...	24	Experimental	17	Python
105	ziansu/codeart Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention...	24	Experimental	18	Python
106	vmarinowski/infini-attention An unofficial pytorch implementation of 'Efficient Infinite Context...	24	Experimental	55	Python
107	garyb9/pytorch-transformers Transformers architecture code playground repository in python using PyTorch.	24	Experimental	3	Python
108	mfekadu/nimbus-transformer it's like Nimbus but uses a transformer language model	23	Experimental	2	Python
109	Uokoroafor/transformer_from_scratch This is a PyTorch implementation of the Transformer model in the paper...	23	Experimental	1	Python
110	rishabkr/Attention-Is-All-You-Need-Explained-PyTorch A paper implementation and tutorial from scratch combining various great...	23	Experimental	19	Jupyter Notebook
111	davide-coccomini/TimeSformer-Video-Classification The notebook explains the various steps to obtain the results of...	23	Experimental	42	Jupyter Notebook
112	jaketae/tupe PyTorch implementation of Rethinking Positional Encoding in Language Pre-training	23	Experimental	6	Python
113	gmontamat/poor-mans-transformers Implement Transformers (and Deep Learning) from scratch in NumPy	23	Experimental	28	Python
114	bfilar/URLTran PyTorch/HuggingFace Implementation of URLTran: Improving Phishing URL...	23	Experimental	37	Python
115	trialandsuccess/verysimpletransformers Very Simple Transformers provides a simplified interface for packaging,...	23	Experimental	4	Python
116	mingikang31/Convolutional-Nearest-Neighbor-Attention Convolutional Nearest Neighbor Attention for Transformers	22	Experimental	—	Python
117	simboco/flash-linear-attention 💥 Optimize linear attention models with efficient Triton-based...	22	Experimental	—	Python
118	Gala2044/Transformers-for-absolute-dummies 🚀 Master transformers with this simple guide that breaks down complex...	22	Experimental	—	—
119	kazuki-irie/kv-memory-brain Official Code Repository for the paper "Key-value memory in the brain"	22	Experimental	31	Jupyter Notebook
120	allenai/staged-training Staged Training for Transformer Language Models	22	Experimental	33	Jupyter Notebook
121	NTT123/sketch-transformer Modeling Draw, Quick! dataset using transformers	22	Experimental	7	Python
122	pelagecha/typ Associative Memory Augmentation for Long-Context Retrieval in Transformers	22	Experimental	—	Python
123	teddykoker/grokking PyTorch implementation of "Grokking: Generalization Beyond Overfitting on...	22	Experimental	39	Python
124	antofuller/configaformers A python library for highly configurable transformers - easing model...	22	Experimental	48	Python
125	mcbal/spin-model-transformers Physics-inspired transformer modules based on mean-field dynamics of...	22	Experimental	46	Python
126	dpressel/mint MinT: Minimal Transformer Library and Tutorials	22	Experimental	261	Python
127	rahul13ramesh/compositional_capabilities Compositional Capabilities of Autoregressive Transformers: A Study on...	21	Experimental	10	Python
128	osiriszjq/impulse_init Convolutional Initialization for Data-Efficient Vision Transformers	21	Experimental	16	Jupyter Notebook
129	somosnlp/the-annotated-transformer Traducción al español del notebook "The Annotated Transformer" de Harvard...	20	Experimental	6	—
130	erfanzar/OST-OpenSourceTransformers OST Collection: An AI-powered suite of models that predict the next word...	20	Experimental	16	Jupyter Notebook
131	milistu/outformer Clean Outputs from Language Models	20	Experimental	11	Python
132	ArtificialZeng/transformers-Explained 官方transformers源码解析。AI大模型时代，pytorch、transformer是新操作系统，其他都是运行在其上面的软件。	20	Experimental	16	Python
133	declare-lab/KNOT This repository contains the implementation of the paper -- KNOT: Knowledge...	20	Experimental	15	Python
134	hmohebbi/ValueZeroing The official repo for the EACL 2023 paper "Quantifying Context Mixing in...	20	Experimental	12	Python
135	dunktra/attention-binding-a11y Code for tracking concept emergence via attention-head binding (EB*). Pythia...	20	Experimental	1	Jupyter Notebook
136	ArpitKadam/Attention-Is-All-You-Code From Attention Mechanisms to Large Language Models — built from scratch.	20	Experimental	1	Jupyter Notebook
137	hereandnowai/transformers-simplified Simplified, standalone Python scripts for transformer models, LLMs, TTS,...	20	Experimental	1	Python
138	Brokttv/Transformer-from-scratch elaborate transformer implementation + detailed explanation	19	Experimental	5	Python
139	ays-dev/keras-transformer Encoder-Decoder Transformer with cross-attention	19	Experimental	5	Python
140	hrithickcodes/transformer-tf This repository contains the code for the paper "Attention Is All You Need"...	19	Experimental	9	Jupyter Notebook
141	mingikang31/Fully-Convolutional-Transformers FCT: Fully Convolutional Transformers	19	Experimental	—	Python
142	KeepALifeUS/ml-attention-mechanisms Flash Attention, RoPE, multi-head attention for temporal patterns	19	Experimental	—	Python
143	Cobkgukgg/forgenn Modern neural networks in pure NumPy - Transformers, ResNet, and more	19	Experimental	—	Python
144	mtingers/kompoz kompoz: Composable predicate and transform combinators with operator overloading	19	Experimental	—	Python
145	marcolacagnina/transformer-for-code-analysis PyTorch implementation of a Transformer Encoder to predict the Big O time...	19	Experimental	—	Jupyter Notebook
146	gheb02/chess-transformer This repository implements a KV Cache mechanism in autoregressive...	19	Experimental	—	Jupyter Notebook
147	Johnpaul10j/Transformers-with-keras Used the keras library to build a transformer using a sequence to sequence...	19	Experimental	—	Jupyter Notebook
148	jdmogollonp/tips-dpt-decoder Implementation of DeepMind TIPS DPT Decoder	19	Experimental	—	Jupyter Notebook
149	Abhinand20/MathFormer MathFormer - Solve math equations using NLP and transformers!	18	Experimental	7	Python
150	osiriszjq/structured_init Structured Initialization for Attention in Vision Transformers	18	Experimental	4	Python
151	ansh-info/Titans-Learning-to-Memorize-at-Test-Time-with-Manim Visual animated walkthroughs of the DeepMind "Titans: Learning to Memorize...	18	Experimental	3	Python
152	Bradley-Butcher/Conformers Unofficial implementation of Conformal Language Modeling by Quach et al	17	Experimental	29	Python
153	princeton-nlp/dyck-transformer [ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages	17	Experimental	13	Python
154	shreydan/scratchformers building various transformer model architectures and its modules from scratch.	17	Experimental	13	Jupyter Notebook
155	afspies/attention-tutorial Jupyter Notebook tutorial on Attention Mechanisms, Position Embeddings and...	17	Experimental	5	Jupyter Notebook
156	danadascalescu00/ioai-transformer-workshop A hands-on introduction to Transformer architecture, designed for...	17	Experimental	2	Jupyter Notebook
157	0xOpenBytes/c 📦 Micro Composition using Transformations and Cache	17	Experimental	5	Swift
158	AMDonati/SMC-T-v2 Code for the paper "The Monte Carlo Transformer: a stochastic self-attention...	16	Experimental	2	Jupyter Notebook
159	shubhexists/transformers basic implementation of transformers	16	Experimental	12	Python
160	tech-srl/layer_norm_expressivity_role Code for the paper "On the Expressivity Role of LayerNorm in Transformers'...	16	Experimental	57	Python
161	Anne-Andresen/Multi-Modal-cuda-C-GAN Raw C/cuda implementation of 3d GAN	16	Experimental	3	Cuda
162	harrisonvshen/triton-accelerated-attention Custom Triton GPU kernels for multi-head attention, including QK^T, softmax,...	16	Experimental	1	Python
163	KOKOSde/sparse-clt Cross-Layer Transcoder (CLT) library for extracting sparse interpretable...	16	Experimental	1	Python
164	frikishaan/pytorch-transformers This repository contains the original transformers model implementation code.	16	Experimental	1	Python
165	NeuralCoder3/custom_infinite_craft A custom implementation of Infinite Craft (https://neal.fun/infinite-craft/)	16	Experimental	3	Python
166	BoCtrl-C/attention-rollout Unofficial PyTorch implementation of Attention Rollout	15	Experimental	6	Jupyter Notebook
167	mcbal/afem Implementation of approximate free-energy minimization in PyTorch	15	Experimental	21	Python
168	hazdzz/converter The official PyTorch implementation of Converter.	15	Experimental	7	Python
169	homerjed/transformer_flows Implementation of Apple ML's Transformer Flow (or TARFlow) from "Normalising...	15	Experimental	6	Python
170	parham1998/Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling Official Pytorch Implementation of: "Enhancing High-Vocabulary Image...	15	Experimental	2	Python
171	shilongdai/ROT5 Small transformer trained from scratch	15	Experimental	2	Python
172	thiomajid/distil_xlstm Learning Attention Mechanisms through Recurrent Structures	15	Experimental	—	Jupyter Notebook
173	Jayluci4/micro-attention Attention mechanism in ~50 lines - understand transformers by building from scratch	15	Experimental	—	Jupyter Notebook
174	Prakhar-Bhartiya/Transformers_From_Scratch A walkthrough that builds a Transformer from first principles inside Jupyter...	15	Experimental	—	Jupyter Notebook
175	ArshockAbedan/Natural-Language-Processing-with-Attention-Models Attention Models in NLP	15	Experimental	2	Jupyter Notebook
176	KOKOSde/sparse-transcoder PyPI package for optimized sparse feature extraction from transformer...	15	Experimental	—	Python
177	pavlosdais/Transformers-Linear-Algebra Transformer Based Learning of Fundamental Linear Algebra Operations	15	Experimental	—	Python
178	Mozeel-V/nebula-mini Minimal PyTorch-based Nebula pipeline replica for malware behavior modeling	15	Experimental	—	Python
179	tom-effernelli/small-LLM Implementing the 'Attention is all you need' paper through a simple LLM model	15	Experimental	—	Python
180	CESOIA/transformer-surgeon Transformer models library with compression options	15	Experimental	1	Python
181	dlukeh/transformer-deep-dive A deep descent into the neural abyss — understanding transformers through...	14	Experimental	—	—
182	MrHenstep/NN_Self_Learn Neural network architectures from perceptrons to GPT, built and trained from scratch	14	Experimental	—	Python
183	abc1203/transformer-model An implementation of the transformer deep learning model, based on the...	14	Experimental	1	PureBasic
184	ozyurtf/attention-and-transformers The purpose of this project is to understand how the Transformers work and...	14	Experimental	1	Python
185	macespinoza/mini-transformer-didactico Implementación didáctica de un Transformer Encoder–Decoder basada en...	14	Experimental	3	Jupyter Notebook
186	M-e-r-c-u-r-y/pytorch-transformers Collection of different types of transformers for learning purposes	14	Experimental	12	Jupyter Notebook
187	pranoyr/attention-models Simplified Implementation of SOTA Deep Learning Papers in Pytorch	14	Experimental	4	Python
188	bikhanal/transformers The implementation of transformer as presented in the paper "Attention is...	14	Experimental	9	Python
189	ghubnerr/attention-mechanisms A compilation of most State-of-the-Art Attention Mechanisms: MHSA, MQA, GQA,...	14	Experimental	3	Python
190	kyegomez/AttnWithConvolutions Interleaved Attention's with convolutions for text modeling	13	Experimental	6	Python
191	kyegomez/GATS Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in...	13	Experimental	8	Python
192	mawright/pytorch-sparse-utils Low-level utilities for Pytorch sparse tensors and operations	13	Experimental	—	Python
193	gmongaras/Cottention_Transformer Code for the paper "Cottention: Linear Transformers With Cosine Attention"	13	Experimental	20	Cuda
194	Vadimbuildercxx/looped_transformer Experimental implementation of "Looped Transformers are Better at Learning...	13	Experimental	8	Jupyter Notebook
195	rajveer43/titan_transformer Unofficial implementation of titans transformer	13	Experimental	8	Jupyter Notebook
196	Lucasc-99/NoTorch A from-scratch neural network and transformers library, with speeds rivaling PyTorch	13	Experimental	10	Python
197	snoop2head/Deep-Encoder-Shallow-Decoder 🤗 Huggingface Implementation of Kasai et al(2020) "Deep Encoder, Shallow...	13	Experimental	6	Python
198	NathanLeroux-git/OnlineTransformerWithSpikingNeurons This code is the implementation of the Spiking Online Transformer of the...	13	Experimental	8	Python
199	HySonLab/HierAttention Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range...	13	Experimental	8	Python
200	kyegomez/Mixture-of-MQA An implementation of a switch transformer like Multi-query attention model	13	Experimental	8	Python
201	yulang/phrasal-composition-in-transformers This repo contains datasets and code for Assessing Phrasal Representation...	13	Experimental	11	Python
202	SyedAkramaIrshad/transformer-grokking-lab Tiny Transformer grokking experiment with live notebook visualizations.	13	Experimental	2	Jupyter Notebook
203	PeterJemley/Continuous-Depth-Transformers-with-Learned-Control-Dynamics Hybrid transformer architecture replacing discrete layers with Neural ODE...	13	Experimental	2	Jupyter Notebook
204	tzhengtek/saute SAUTE is a lightweight transformer-based architecture adapted for dialog modeling	13	Experimental	2	Python
205	zzmtsvv/ad-gta Grouped-Tied Attention by Zadouri, Strauss, Dao (2025).	13	Experimental	2	Python
206	Omikrone/Mnemos Mnemos is a mini-LLM based on Transformers, designed for training and...	13	Experimental	2	Python
207	Carnetemperrado/x-transformers-rl x-transformers-rl is a work-in-progress implementation of a transformer for...	12	Experimental	1	Python
208	VinkuraAI/AXEN-M AXEN-M (Attention eXtended Efficient Network - Model) is a powerful...	12	Experimental	3	Python
209	awadalaa/transact An unofficial implementation of "TransAct: Transformer-based Realtime User...	12	Experimental	3	Python
210	moskomule/simple_transformers Simple transformer implementations that I can understand	12	Experimental	20	Python
211	SergioArnaud/attention-is-all-you-need Implementation of a transformer following the Attention Is All You Need paper	12	Experimental	4	Python
212	lorenzobalzani/nlp-dl-experiments Python implementation of Deep Learning models, with a focus on NLP.	12	Experimental	3	Jupyter Notebook
213	agasheaditya/handson-transformers End-to-end implementation of Transformers using PyTorch from scratch	12	Experimental	3	Jupyter Notebook
214	kyegomez/MultiQuerySuperpositionAttention Multi-Query Attention with Sub-linear Masking, Superposition, and Entanglement	12	Experimental	4	—
215	Sarhamam/ZetaFormer Curriculum learning framework that uses geometrically structured datasets...	12	Experimental	1	Python
216	viktor-shcherb/qk-pca-analysis PCA analysis of Q/K attention vectors to discover position-correlated...	12	Experimental	1	Python
217	R2D2-08/turmachpy A python package for simulating a variety of Turing machines.	12	Experimental	1	Python
218	Sid7on1/Transformer-256dim A powerful Transformer architecture built from scratch by Prajwal for...	12	Experimental	1	Python
219	DzmitryPihulski/Encoder-transformer-from-scratch Fully functional encoder transformer from tokenizer to lm-head	12	Experimental	1	Python
220	tegridydev/hydraform Self-Evolving Python Transformer Research	12	Experimental	1	Python
221	hunterhammond-dev/attention-mechanisms-in-transformers Learn and visualize attention mechanisms in transformer models — inspired by...	12	Experimental	1	Python
222	viktor-shcherb/qk-sniffer Capture sampled Q/K attention vectors from HF transformers into per-branch...	12	Experimental	1	Python
223	pedrocurvo/HAET HAET: Hierarchical Attention Erwin Transolver is a hybrid neural...	12	Experimental	1	Python
224	NLP-Project-PoliMi-2025/NLP-Project Can chess be tackled using NLP techniques? "Natural Language Processing"...	12	Experimental	1	Jupyter Notebook
225	arvind207kumar/Time-Cross-Adaptive-Self-Attention-TCSA-based-Imputation-model- Time-Cross Adaptive Self-Attention (TCSA) model for multivariate Time...	12	Experimental	1	Jupyter Notebook
226	kanenorman/grassmann Attempt at reproducing "Attention Is Not What You Need: Grassmann Flows as...	11	Experimental	—	Python
227	richengguy/calc.ai Transformer-based Calculator	11	Experimental	—	Python
228	sarabesh/exploring-transformers A typical repo, to contain code I am doing to learn transformers...	11	Experimental	—	Python
229	Chamiln17/Transformer-From-Scratch My implmentation of the transformer architecture described in the paper...	11	Experimental	—	Python
230	rashi-bhansali/encoder-decoder-transformer-variants-from-scratch PyTorch implementation of Transformer encoder and GPT-style decoder with...	11	Experimental	—	Python
231	chaowei312/HyperGraph-Sparse-Attention Sparse attention via hypergraph partitioning for efficient long-context transformers	11	Experimental	—	Python
232	wildanjr19/transformers-from-scratch Implementing Transformers from Attention is All You Need paper in scratch.	11	Experimental	—	Python
233	sathishkumar67/Byte-Latent-Transformer Implementation of Byte Latent Transformer	11	Experimental	—	Python
234	benearnthof/SparseTransformers Reproducing the Paper Generating Long Sequences with Sparse Transformers by...	11	Experimental	—	Python
235	adityakamat24/triton-fast-mha A high-performance kernel implementation of multi-head attention using...	11	Experimental	—	Python
236	albertkjoller/transformer-redundancy Code for the paper "How Redundant Is the Transformer Stack in Speech...	11	Experimental	—	Python
237	isakovaad/fedcsis25 A machine learning project to predict chess puzzle difficulty ratings using...	11	Experimental	—	Jupyter Notebook
238	AnkitaMungalpara/Building-DeepSeek-From-Scratch This repository shows how to build a DeepSeek language model from scratch...	11	Experimental	—	Jupyter Notebook
239	balamarimuthu/deep-learning-with-pytorch This repository contains a minimal PyTorch-based Transformer model...	11	Experimental	—	Jupyter Notebook
240	Joe-Naz01/transformers A deep learning project that implements and explains the fundamental...	11	Experimental	—	Jupyter Notebook
241	Projects-Developer/Transformer-Models-For-NLP-Applications Includes Source Code, PPT, Synopsis, Report, Documents, Base Research Paper...	11	Experimental	—	—
242	samaraxmmar/transformer-explained A hands-on guide to understanding and building Transformer models from...	11	Experimental	—	Jupyter Notebook
243	graphcore-research/flash-attention-ipu Poplar implementation of FlashAttention for IPU	11	Experimental	2	C++
244	gustavecortal/transformer Slides from my NLP course on the transformer architecture	11	Experimental	2	—
245	kikirizki/transformer Minimalistic PyTorch implementation of transformer	11	Experimental	2	Jupyter Notebook
246	BramVanroy/lt3-2019-transformer-trainer Transformer trainer for variety of classification problems that has been...	11	Experimental	2	Python
247	lmxx1234567/goofy-hydra Goofy Hydra is a Transport Layer Link Aggregator based on Transformer	11	Experimental	2	Python
248	ytgui/SPT-proto This repo includes a Sparse Transformer implementation which utilizes PQ to...	11	Experimental	2	Python
249	Dhyanam04/ByteFetcher This is ByteFetcher	11	Experimental	2	Python
250	dariush-bahrami/mytransformers My implementation of transformers	11	Experimental	2	Python
251	ander-db/Transformers-PytorchLightning 👋 This is my implementation of the Transformer architecture from scratch...	11	Experimental	2	Python
252	Jourdelune/Transformer My implementation of the transformer architecture from the paper "Attention...	11	Experimental	2	Python
253	maxime7770/Transformers-Insights Exploring how Transformers actually transform the data under the hood	11	Experimental	2	Python
254	ariva00/GaussianAttention4Matching Code for the models described in the paper Localized Gaussians as...	11	Experimental	2	Python
255	kyegomez/open-text-embedding-ada-002 This repository presents a production-grade implementation of a...	11	Experimental	2	Python
256	Ranjit2111/Transformer-NMT A PyTorch implementation of the Transformer architecture from "Attention Is...	11	Experimental	—	Python
257	AlperYildirim1/Attention-is-All-You-Need-Pytorch A fully reproducible, high-performance PyTorch Colab implementation of the...	10	Experimental	3	Jupyter Notebook
258	hash-ir/transformer-lab Hands-on implementation of transformer and related models	10	Experimental	1	Python
259	pplkit/AllYouNeedIsAttention An efficient and robust implementation of the seminal "Attention Is All You...	10	Experimental	1	Python
260	fatou1526/Pytorch_Transformers This repo contains codes concerning pytorch models from how to define the...	10	Experimental	1	Python
261	girishdhegde/NLP Implementation of Deep Learning based Language Models from scratch in PyTorch	10	Experimental	1	Python
262	microcoder-py/attn-is-all-you-need A TFX implementation of the paper on transformers, Attention is All You Need	10	Experimental	1	Python
263	shahrukhx01/transformers-bisected A repo containing all building blocks of transformer model for text...	10	Experimental	1	Python
264	Ipvikukiepki-KQS/progressive-transformers A neural network architecture for building conversational agents	10	Experimental	1	Python
265	devrahulbanjara/Transformers-from-Scratch A repository implementing Transformers from scratch using PyTorch, designed...	10	Experimental	1	Jupyter Notebook
266	JHansiduYapa/Transformer-Model-from-Scratch Build a Transformer model from scratch using Pytorch, implementing key...	10	Experimental	1	Jupyter Notebook
267	NipunRathore/NLP-Transformers-from-Scratch Pre-training a Transformer from scratch.	10	Experimental	1	Jupyter Notebook

Comparisons in this category

x-transformers and simple-hierarchical-transformer (79 vs 58) x-transformers and Fast-Transformer (79 vs 44) x-transformers and TransformerX (79 vs 44) x-transformers and attn_res (79 vs 48)