Transformer Architecture Tutorials Transformer Models
Educational implementations and hands-on learning resources covering transformer fundamentals, attention mechanisms, and core architecture components. Does NOT include domain-specific applications (math solving, embeddings, RL), research papers on transformer theory, or production-grade models.
There are 267 transformer architecture tutorials models tracked. 1 score above 70 (verified tier). The highest-rated is lucidrains/x-transformers at 79/100 with 5,808 stars. 1 of the top 10 are actively maintained.
Get all 267 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-tutorials&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising... |
|
Verified |
| 2 |
kanishkamisra/minicons
Utility for behavioral and representational analyses of Language Models |
|
Established |
| 3 |
lucidrains/dreamer4
Implementation of Danijar's latest iteration for his Dreamer line of work |
|
Established |
| 4 |
lucidrains/simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical... |
|
Established |
| 5 |
lucidrains/locoformer
LocoFormer - Generalist Locomotion via Long-Context Adaptation |
|
Established |
| 6 |
helpmefindaname/transformer-smaller-training-vocab
Temporary remove unused tokens during training to save ram and speed. |
|
Established |
| 7 |
kyegomez/attn_res
A clean, single-file PyTorch implementation of Attention Residuals (Kimi... |
|
Emerging |
| 8 |
allenai/smashed
SMASHED is a toolkit designed to apply transformations to samples in... |
|
Emerging |
| 9 |
kyegomez/zeta
Build high-performance AI models with modular building blocks |
|
Emerging |
| 10 |
Nicolepcx/Transformers-in-Action
This is the corresponding code for the book Transformers in Action |
|
Emerging |
| 11 |
tomaarsen/attention_sinks
Extend existing LLMs way beyond the original training length with constant... |
|
Emerging |
| 12 |
tensorops/TransformerX
Flexible Python library providing building blocks (layers) for reproducible... |
|
Emerging |
| 13 |
Rishit-dagli/Fast-Transformer
An implementation of Additive Attention |
|
Emerging |
| 14 |
Rishit-dagli/Perceiver
Implementation of Perceiver, General Perception with Iterative Attention |
|
Emerging |
| 15 |
gordicaleksa/pytorch-original-transformer
My implementation of the original transformer model (Vaswani et al.). I've... |
|
Emerging |
| 16 |
KRR-Oxford/HierarchyTransformers
Language Models as Hierarchy Encoders |
|
Emerging |
| 17 |
kyegomez/SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers:... |
|
Emerging |
| 18 |
Emmi-AI/noether
Deep-learning framework for Engineering AI. Built on transformer building... |
|
Emerging |
| 19 |
HUSTAI/uie_pytorch
PaddleNLP UIE模型的PyTorch版实现 |
|
Emerging |
| 20 |
dell-research-harvard/linktransformer
A convenient way to link, deduplicate, aggregate and cluster data(frames) in... |
|
Emerging |
| 21 |
bhavsarpratik/easy-transformers
Utility functions to work with transformers |
|
Emerging |
| 22 |
kyegomez/HLT
Implementation of the transformer from the paper: "Real-World Humanoid... |
|
Emerging |
| 23 |
cedrickchee/awesome-transformer-nlp
A curated list of NLP resources focused on Transformer networks, attention... |
|
Emerging |
| 24 |
jiwidi/Behavior-Sequence-Transformer-Pytorch
This is a pytorch implementation for the BST model from Alibaba... |
|
Emerging |
| 25 |
The-AI-Summer/self-attention-cv
Implementation of various self-attention mechanisms focused on computer... |
|
Emerging |
| 26 |
0x7o/RETRO-transformer
Easy-to-use Retrieval-Enhanced Transformer implementation |
|
Emerging |
| 27 |
haoliuhl/ringattention
Large Context Attention |
|
Emerging |
| 28 |
Lightning-Universe/lightning-transformers
Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning |
|
Emerging |
| 29 |
AlignmentResearch/tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer |
|
Emerging |
| 30 |
marella/ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library. |
|
Emerging |
| 31 |
eduard23144/locoformer
🤖 Explore LocoFormer, a Transformer-XL model that enhances robot locomotion... |
|
Emerging |
| 32 |
chengzeyi/ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model... |
|
Emerging |
| 33 |
bodeby/torchstack
🫧 probability-level model ensembling for transformers |
|
Emerging |
| 34 |
K-H-Ismail/torchortho
[ICLR 2026] Polynomial, trigonometric, and tropical activations |
|
Emerging |
| 35 |
sgrvinod/chess-transformers
Teaching transformers to play chess |
|
Emerging |
| 36 |
google-research/long-range-arena
Long Range Arena for Benchmarking Efficient Transformers |
|
Emerging |
| 37 |
lxuechen/private-transformers
A codebase that makes differentially private training of transformers easy. |
|
Emerging |
| 38 |
jonrbates/turing
A PyTorch library for simulating Turing machines with neural networks, based... |
|
Emerging |
| 39 |
Rishit-dagli/Conformer
An implementation of Conformer: Convolution-augmented Transformer for Speech... |
|
Emerging |
| 40 |
softmax1/Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN. |
|
Emerging |
| 41 |
Gurumurthy30/Stackformer
Modular PyTorch transformer library for building, training, and... |
|
Emerging |
| 42 |
Beomi/InfiniTransformer
Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No... |
|
Emerging |
| 43 |
IvanBongiorni/maximal
A TensorFlow-compatible Python library that provides models and layers to... |
|
Emerging |
| 44 |
ziplab/LIT
[AAAI 2022] This is the official PyTorch implementation of "Less is More:... |
|
Emerging |
| 45 |
dingo-actual/infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind:... |
|
Emerging |
| 46 |
kreasof-ai/OpenFormer
A hackable library for running and fine-tuning modern transformer models on... |
|
Emerging |
| 47 |
deep-div/Custom-Transformer-Pytorch
A clean, ground-up implementation of the Transformer architecture in... |
|
Emerging |
| 48 |
prajjwal1/fluence
A deep learning library based on Pytorch focussed on low resource language... |
|
Emerging |
| 49 |
neulab/knn-transformers
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling... |
|
Emerging |
| 50 |
knotgrass/attention
several types of attention modules written in PyTorch for learning purposes |
|
Emerging |
| 51 |
rafiepour/CTran
Complete code for the proposed CNN-Transformer model for natural language... |
|
Emerging |
| 52 |
Geotrend-research/smaller-transformers
Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0. |
|
Emerging |
| 53 |
cyk1337/Transformer-in-PyTorch
Transformer/Transformer-XL/R-Transformer examples and explanations |
|
Emerging |
| 54 |
clovaai/length-adaptive-transformer
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021) |
|
Emerging |
| 55 |
nihalsangeeth/behaviour-seq-transformer
Pytorch implementation of "Behaviour Sequence Transformer for E-commerce... |
|
Emerging |
| 56 |
naokishibuya/simple_transformer
A Transformer Implementation that is easy to understand and customizable. |
|
Emerging |
| 57 |
templetwo/PhaseGPT
Kuramoto Phase-Coupled Oscillator Attention in Transformers |
|
Emerging |
| 58 |
The-Swarm-Corporation/Hyena-Y
A PyTorch implementation of the Hyena-Y model, a convolution-based... |
|
Emerging |
| 59 |
cosbidev/NAIM
Official implementation for the paper ``Not Another Imputation Method: A... |
|
Emerging |
| 60 |
ccdv-ai/convert_checkpoint_to_lsg
Efficient Attention for Long Sequence Processing |
|
Emerging |
| 61 |
chef-transformer/chef-transformer
Chef Transformer 🍲 . |
|
Emerging |
| 62 |
Kirill-Kravtsov/drophead-pytorch
An implementation of drophead regularization for pytorch transformers |
|
Emerging |
| 63 |
iil-postech/semantic-attention
Official implementation of "Attention-aware semantic communications for... |
|
Emerging |
| 64 |
mohyunho/NAS_transformer
Evolutionary Neural Architecture Search on Transformers for RUL Prediction |
|
Emerging |
| 65 |
mhw32/prototransformer-public
PyTorch implementation for "ProtoTransformer: A Meta-Learning Approach to... |
|
Emerging |
| 66 |
alexeykarnachev/full_stack_transformer
Pytorch library for end-to-end transformer models training, inference and serving |
|
Emerging |
| 67 |
warner-benjamin/commented-transformers
Highly commented implementations of Transformers in PyTorch |
|
Experimental |
| 68 |
frankaging/ReCOGS
ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of... |
|
Experimental |
| 69 |
saeeddhqan/tiny-transformer
Tiny transformer models implemented in pytorch. |
|
Experimental |
| 70 |
antonyvigouret/Pay-Attention-to-MLPs
My implementation of the gMLP model from the paper "Pay Attention to MLPs". |
|
Experimental |
| 71 |
Selozhd/FNet-tensorflow
Tensorflow Implementation of "FNet: Mixing Tokens with Fourier Transforms." |
|
Experimental |
| 72 |
Baran-phys/Tropical-Attention
[NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic... |
|
Experimental |
| 73 |
jaketae/alibi
PyTorch implementation of Train Short, Test Long: Attention with Linear... |
|
Experimental |
| 74 |
maxxxzdn/erwin
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical... |
|
Experimental |
| 75 |
fattorib/fusedswiglu
Fused SwiGLU Triton kernels |
|
Experimental |
| 76 |
arshadshk/SAINT-pytorch
SAINT PyTorch implementation |
|
Experimental |
| 77 |
tgautam03/Transformers
A Gentle Introduction to Transformers Neural Network |
|
Experimental |
| 78 |
c00k1ez/plain-transformers
Transformer models implementation for training from scratch. |
|
Experimental |
| 79 |
AkiRusProd/numpy-transformer
A numpy implementation of the Transformer model in "Attention is All You Need" |
|
Experimental |
| 80 |
BubbleJoe-BrownU/TransformerHub
This is a repository of transformer-like models, including Transformer, GPT,... |
|
Experimental |
| 81 |
SakanaAI/evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain... |
|
Experimental |
| 82 |
Agora-Lab-AI/HydraNet
HydraNet is a state-of-the-art transformer architecture that combines... |
|
Experimental |
| 83 |
will-thompson-k/tldr-transformers
The "tl;dr" on a few notable transformer papers (pre-2022). |
|
Experimental |
| 84 |
iKernels/transformers-lightning
A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses... |
|
Experimental |
| 85 |
kyegomez/Open-NAMM
An open source implementation of the paper: "AN EVOLVED UNIVERSAL TRANSFORMER MEMORY" |
|
Experimental |
| 86 |
Kareem404/hyper-connections
A minimal implementation of Manifold-Constrained Hyper-Connections (mHC)... |
|
Experimental |
| 87 |
kyegomez/Open-Olmo
Unofficial open-source PyTorch implementation of the OLMo Hybrid... |
|
Experimental |
| 88 |
telekom/transformer-tools
Transformers Training Tools |
|
Experimental |
| 89 |
mcbal/deep-implicit-attention
Implementation of deep implicit attention in PyTorch |
|
Experimental |
| 90 |
hasanisaeed/C-Transformer
Implementation of the core Transformer architecture in pure C |
|
Experimental |
| 91 |
ArneBinder/pytorch-ie-hydra-template-1
PyTorch-IE Hydra Template |
|
Experimental |
| 92 |
arshadshk/Last_Query_Transformer_RNN-PyTorch
Implementation of the paper "Last Query Transformer RNN for knowledge... |
|
Experimental |
| 93 |
fualsan/TransformerFromScratch
PyTorch Implementation of Transformer Deep Learning Model |
|
Experimental |
| 94 |
RJain12/choformer
Cho codon optimization WIP |
|
Experimental |
| 95 |
FareedKhan-dev/Understanding-Transformers-Step-by-Step-math-example
Understanding Large Language Transformer Architecture like a child |
|
Experimental |
| 96 |
codyjk/ChessGPT
♟️ A transformer that plays chess 🤖 |
|
Experimental |
| 97 |
MurtyShikhar/TreeProjections
Tool to measure tree-structuredness of the internal algorithm learnt by a... |
|
Experimental |
| 98 |
chris-santiago/met
Reproducing the MET framework with PyTorch |
|
Experimental |
| 99 |
xdevfaheem/Transformers
A Comprehensive Implementation of Transformers Architecture from Scratch |
|
Experimental |
| 100 |
crscardellino/argumentation-mining-transformers
Argumentation Mining Transformers Module (AMTM) implementation. |
|
Experimental |
| 101 |
NiuTrans/Introduction-to-Transformers
An introduction to basic concepts of Transformers and key techniques of... |
|
Experimental |
| 102 |
mtanghu/LEAP
LEAP: Linear Explainable Attention in Parallel for causal language modeling... |
|
Experimental |
| 103 |
nullHawk/simple-transformer
Implementation of Transformer model in PyTorch |
|
Experimental |
| 104 |
KhaledSharif/robot-transformers
Train and evaluate an Action Chunking Transformer (ACT) to perform... |
|
Experimental |
| 105 |
ziansu/codeart
Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention... |
|
Experimental |
| 106 |
vmarinowski/infini-attention
An unofficial pytorch implementation of 'Efficient Infinite Context... |
|
Experimental |
| 107 |
garyb9/pytorch-transformers
Transformers architecture code playground repository in python using PyTorch. |
|
Experimental |
| 108 |
mfekadu/nimbus-transformer
it's like Nimbus but uses a transformer language model |
|
Experimental |
| 109 |
Uokoroafor/transformer_from_scratch
This is a PyTorch implementation of the Transformer model in the paper... |
|
Experimental |
| 110 |
rishabkr/Attention-Is-All-You-Need-Explained-PyTorch
A paper implementation and tutorial from scratch combining various great... |
|
Experimental |
| 111 |
davide-coccomini/TimeSformer-Video-Classification
The notebook explains the various steps to obtain the results of... |
|
Experimental |
| 112 |
jaketae/tupe
PyTorch implementation of Rethinking Positional Encoding in Language Pre-training |
|
Experimental |
| 113 |
gmontamat/poor-mans-transformers
Implement Transformers (and Deep Learning) from scratch in NumPy |
|
Experimental |
| 114 |
bfilar/URLTran
PyTorch/HuggingFace Implementation of URLTran: Improving Phishing URL... |
|
Experimental |
| 115 |
trialandsuccess/verysimpletransformers
Very Simple Transformers provides a simplified interface for packaging,... |
|
Experimental |
| 116 |
mingikang31/Convolutional-Nearest-Neighbor-Attention
Convolutional Nearest Neighbor Attention for Transformers |
|
Experimental |
| 117 |
simboco/flash-linear-attention
💥 Optimize linear attention models with efficient Triton-based... |
|
Experimental |
| 118 |
Gala2044/Transformers-for-absolute-dummies
🚀 Master transformers with this simple guide that breaks down complex... |
|
Experimental |
| 119 |
kazuki-irie/kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain" |
|
Experimental |
| 120 |
allenai/staged-training
Staged Training for Transformer Language Models |
|
Experimental |
| 121 |
NTT123/sketch-transformer
Modeling Draw, Quick! dataset using transformers |
|
Experimental |
| 122 |
pelagecha/typ
Associative Memory Augmentation for Long-Context Retrieval in Transformers |
|
Experimental |
| 123 |
teddykoker/grokking
PyTorch implementation of "Grokking: Generalization Beyond Overfitting on... |
|
Experimental |
| 124 |
antofuller/configaformers
A python library for highly configurable transformers - easing model... |
|
Experimental |
| 125 |
mcbal/spin-model-transformers
Physics-inspired transformer modules based on mean-field dynamics of... |
|
Experimental |
| 126 |
dpressel/mint
MinT: Minimal Transformer Library and Tutorials |
|
Experimental |
| 127 |
rahul13ramesh/compositional_capabilities
Compositional Capabilities of Autoregressive Transformers: A Study on... |
|
Experimental |
| 128 |
osiriszjq/impulse_init
Convolutional Initialization for Data-Efficient Vision Transformers |
|
Experimental |
| 129 |
somosnlp/the-annotated-transformer
Traducción al español del notebook "The Annotated Transformer" de Harvard... |
|
Experimental |
| 130 |
erfanzar/OST-OpenSourceTransformers
OST Collection: An AI-powered suite of models that predict the next word... |
|
Experimental |
| 131 |
milistu/outformer
Clean Outputs from Language Models |
|
Experimental |
| 132 |
ArtificialZeng/transformers-Explained
官方transformers源码解析。AI大模型时代,pytorch、transformer是新操作系统,其他都是运行在其上面的软件。 |
|
Experimental |
| 133 |
declare-lab/KNOT
This repository contains the implementation of the paper -- KNOT: Knowledge... |
|
Experimental |
| 134 |
hmohebbi/ValueZeroing
The official repo for the EACL 2023 paper "Quantifying Context Mixing in... |
|
Experimental |
| 135 |
dunktra/attention-binding-a11y
Code for tracking concept emergence via attention-head binding (EB*). Pythia... |
|
Experimental |
| 136 |
ArpitKadam/Attention-Is-All-You-Code
From Attention Mechanisms to Large Language Models — built from scratch. |
|
Experimental |
| 137 |
hereandnowai/transformers-simplified
Simplified, standalone Python scripts for transformer models, LLMs, TTS,... |
|
Experimental |
| 138 |
Brokttv/Transformer-from-scratch
elaborate transformer implementation + detailed explanation |
|
Experimental |
| 139 |
ays-dev/keras-transformer
Encoder-Decoder Transformer with cross-attention |
|
Experimental |
| 140 |
hrithickcodes/transformer-tf
This repository contains the code for the paper "Attention Is All You Need"... |
|
Experimental |
| 141 |
mingikang31/Fully-Convolutional-Transformers
FCT: Fully Convolutional Transformers |
|
Experimental |
| 142 |
KeepALifeUS/ml-attention-mechanisms
Flash Attention, RoPE, multi-head attention for temporal patterns |
|
Experimental |
| 143 |
Cobkgukgg/forgenn
Modern neural networks in pure NumPy - Transformers, ResNet, and more |
|
Experimental |
| 144 |
mtingers/kompoz
kompoz: Composable predicate and transform combinators with operator overloading |
|
Experimental |
| 145 |
marcolacagnina/transformer-for-code-analysis
PyTorch implementation of a Transformer Encoder to predict the Big O time... |
|
Experimental |
| 146 |
gheb02/chess-transformer
This repository implements a KV Cache mechanism in autoregressive... |
|
Experimental |
| 147 |
Johnpaul10j/Transformers-with-keras
Used the keras library to build a transformer using a sequence to sequence... |
|
Experimental |
| 148 |
jdmogollonp/tips-dpt-decoder
Implementation of DeepMind TIPS DPT Decoder |
|
Experimental |
| 149 |
Abhinand20/MathFormer
MathFormer - Solve math equations using NLP and transformers! |
|
Experimental |
| 150 |
osiriszjq/structured_init
Structured Initialization for Attention in Vision Transformers |
|
Experimental |
| 151 |
ansh-info/Titans-Learning-to-Memorize-at-Test-Time-with-Manim
Visual animated walkthroughs of the DeepMind "Titans: Learning to Memorize... |
|
Experimental |
| 152 |
Bradley-Butcher/Conformers
Unofficial implementation of Conformal Language Modeling by Quach et al |
|
Experimental |
| 153 |
princeton-nlp/dyck-transformer
[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages |
|
Experimental |
| 154 |
shreydan/scratchformers
building various transformer model architectures and its modules from scratch. |
|
Experimental |
| 155 |
afspies/attention-tutorial
Jupyter Notebook tutorial on Attention Mechanisms, Position Embeddings and... |
|
Experimental |
| 156 |
danadascalescu00/ioai-transformer-workshop
A hands-on introduction to Transformer architecture, designed for... |
|
Experimental |
| 157 |
0xOpenBytes/c
📦 Micro Composition using Transformations and Cache |
|
Experimental |
| 158 |
AMDonati/SMC-T-v2
Code for the paper "The Monte Carlo Transformer: a stochastic self-attention... |
|
Experimental |
| 159 |
shubhexists/transformers
basic implementation of transformers |
|
Experimental |
| 160 |
tech-srl/layer_norm_expressivity_role
Code for the paper "On the Expressivity Role of LayerNorm in Transformers'... |
|
Experimental |
| 161 |
Anne-Andresen/Multi-Modal-cuda-C-GAN
Raw C/cuda implementation of 3d GAN |
|
Experimental |
| 162 |
harrisonvshen/triton-accelerated-attention
Custom Triton GPU kernels for multi-head attention, including QK^T, softmax,... |
|
Experimental |
| 163 |
KOKOSde/sparse-clt
Cross-Layer Transcoder (CLT) library for extracting sparse interpretable... |
|
Experimental |
| 164 |
frikishaan/pytorch-transformers
This repository contains the original transformers model implementation code. |
|
Experimental |
| 165 |
NeuralCoder3/custom_infinite_craft
A custom implementation of Infinite Craft (https://neal.fun/infinite-craft/) |
|
Experimental |
| 166 |
BoCtrl-C/attention-rollout
Unofficial PyTorch implementation of Attention Rollout |
|
Experimental |
| 167 |
mcbal/afem
Implementation of approximate free-energy minimization in PyTorch |
|
Experimental |
| 168 |
hazdzz/converter
The official PyTorch implementation of Converter. |
|
Experimental |
| 169 |
homerjed/transformer_flows
Implementation of Apple ML's Transformer Flow (or TARFlow) from "Normalising... |
|
Experimental |
| 170 |
parham1998/Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling
Official Pytorch Implementation of: "Enhancing High-Vocabulary Image... |
|
Experimental |
| 171 |
shilongdai/ROT5
Small transformer trained from scratch |
|
Experimental |
| 172 |
thiomajid/distil_xlstm
Learning Attention Mechanisms through Recurrent Structures |
|
Experimental |
| 173 |
Jayluci4/micro-attention
Attention mechanism in ~50 lines - understand transformers by building from scratch |
|
Experimental |
| 174 |
Prakhar-Bhartiya/Transformers_From_Scratch
A walkthrough that builds a Transformer from first principles inside Jupyter... |
|
Experimental |
| 175 |
ArshockAbedan/Natural-Language-Processing-with-Attention-Models
Attention Models in NLP |
|
Experimental |
| 176 |
KOKOSde/sparse-transcoder
PyPI package for optimized sparse feature extraction from transformer... |
|
Experimental |
| 177 |
pavlosdais/Transformers-Linear-Algebra
Transformer Based Learning of Fundamental Linear Algebra Operations |
|
Experimental |
| 178 |
Mozeel-V/nebula-mini
Minimal PyTorch-based Nebula pipeline replica for malware behavior modeling |
|
Experimental |
| 179 |
tom-effernelli/small-LLM
Implementing the 'Attention is all you need' paper through a simple LLM model |
|
Experimental |
| 180 |
CESOIA/transformer-surgeon
Transformer models library with compression options |
|
Experimental |
| 181 |
dlukeh/transformer-deep-dive
A deep descent into the neural abyss — understanding transformers through... |
|
Experimental |
| 182 |
MrHenstep/NN_Self_Learn
Neural network architectures from perceptrons to GPT, built and trained from scratch |
|
Experimental |
| 183 |
abc1203/transformer-model
An implementation of the transformer deep learning model, based on the... |
|
Experimental |
| 184 |
ozyurtf/attention-and-transformers
The purpose of this project is to understand how the Transformers work and... |
|
Experimental |
| 185 |
macespinoza/mini-transformer-didactico
Implementación didáctica de un Transformer Encoder–Decoder basada en... |
|
Experimental |
| 186 |
M-e-r-c-u-r-y/pytorch-transformers
Collection of different types of transformers for learning purposes |
|
Experimental |
| 187 |
pranoyr/attention-models
Simplified Implementation of SOTA Deep Learning Papers in Pytorch |
|
Experimental |
| 188 |
bikhanal/transformers
The implementation of transformer as presented in the paper "Attention is... |
|
Experimental |
| 189 |
ghubnerr/attention-mechanisms
A compilation of most State-of-the-Art Attention Mechanisms: MHSA, MQA, GQA,... |
|
Experimental |
| 190 |
kyegomez/AttnWithConvolutions
Interleaved Attention's with convolutions for text modeling |
|
Experimental |
| 191 |
kyegomez/GATS
Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in... |
|
Experimental |
| 192 |
mawright/pytorch-sparse-utils
Low-level utilities for Pytorch sparse tensors and operations |
|
Experimental |
| 193 |
gmongaras/Cottention_Transformer
Code for the paper "Cottention: Linear Transformers With Cosine Attention" |
|
Experimental |
| 194 |
Vadimbuildercxx/looped_transformer
Experimental implementation of "Looped Transformers are Better at Learning... |
|
Experimental |
| 195 |
rajveer43/titan_transformer
Unofficial implementation of titans transformer |
|
Experimental |
| 196 |
Lucasc-99/NoTorch
A from-scratch neural network and transformers library, with speeds rivaling PyTorch |
|
Experimental |
| 197 |
snoop2head/Deep-Encoder-Shallow-Decoder
🤗 Huggingface Implementation of Kasai et al(2020) "Deep Encoder, Shallow... |
|
Experimental |
| 198 |
NathanLeroux-git/OnlineTransformerWithSpikingNeurons
This code is the implementation of the Spiking Online Transformer of the... |
|
Experimental |
| 199 |
HySonLab/HierAttention
Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range... |
|
Experimental |
| 200 |
kyegomez/Mixture-of-MQA
An implementation of a switch transformer like Multi-query attention model |
|
Experimental |
| 201 |
yulang/phrasal-composition-in-transformers
This repo contains datasets and code for Assessing Phrasal Representation... |
|
Experimental |
| 202 |
SyedAkramaIrshad/transformer-grokking-lab
Tiny Transformer grokking experiment with live notebook visualizations. |
|
Experimental |
| 203 |
PeterJemley/Continuous-Depth-Transformers-with-Learned-Control-Dynamics
Hybrid transformer architecture replacing discrete layers with Neural ODE... |
|
Experimental |
| 204 |
tzhengtek/saute
SAUTE is a lightweight transformer-based architecture adapted for dialog modeling |
|
Experimental |
| 205 |
zzmtsvv/ad-gta
Grouped-Tied Attention by Zadouri, Strauss, Dao (2025). |
|
Experimental |
| 206 |
Omikrone/Mnemos
Mnemos is a mini-LLM based on Transformers, designed for training and... |
|
Experimental |
| 207 |
Carnetemperrado/x-transformers-rl
x-transformers-rl is a work-in-progress implementation of a transformer for... |
|
Experimental |
| 208 |
VinkuraAI/AXEN-M
AXEN-M (Attention eXtended Efficient Network - Model) is a powerful... |
|
Experimental |
| 209 |
awadalaa/transact
An unofficial implementation of "TransAct: Transformer-based Realtime User... |
|
Experimental |
| 210 |
moskomule/simple_transformers
Simple transformer implementations that I can understand |
|
Experimental |
| 211 |
SergioArnaud/attention-is-all-you-need
Implementation of a transformer following the Attention Is All You Need paper |
|
Experimental |
| 212 |
lorenzobalzani/nlp-dl-experiments
Python implementation of Deep Learning models, with a focus on NLP. |
|
Experimental |
| 213 |
agasheaditya/handson-transformers
End-to-end implementation of Transformers using PyTorch from scratch |
|
Experimental |
| 214 |
kyegomez/MultiQuerySuperpositionAttention
Multi-Query Attention with Sub-linear Masking, Superposition, and Entanglement |
|
Experimental |
| 215 |
Sarhamam/ZetaFormer
Curriculum learning framework that uses geometrically structured datasets... |
|
Experimental |
| 216 |
viktor-shcherb/qk-pca-analysis
PCA analysis of Q/K attention vectors to discover position-correlated... |
|
Experimental |
| 217 |
R2D2-08/turmachpy
A python package for simulating a variety of Turing machines. |
|
Experimental |
| 218 |
Sid7on1/Transformer-256dim
A powerful Transformer architecture built from scratch by Prajwal for... |
|
Experimental |
| 219 |
DzmitryPihulski/Encoder-transformer-from-scratch
Fully functional encoder transformer from tokenizer to lm-head |
|
Experimental |
| 220 |
tegridydev/hydraform
Self-Evolving Python Transformer Research |
|
Experimental |
| 221 |
hunterhammond-dev/attention-mechanisms-in-transformers
Learn and visualize attention mechanisms in transformer models — inspired by... |
|
Experimental |
| 222 |
viktor-shcherb/qk-sniffer
Capture sampled Q/K attention vectors from HF transformers into per-branch... |
|
Experimental |
| 223 |
pedrocurvo/HAET
HAET: Hierarchical Attention Erwin Transolver is a hybrid neural... |
|
Experimental |
| 224 |
NLP-Project-PoliMi-2025/NLP-Project
Can chess be tackled using NLP techniques? "Natural Language Processing"... |
|
Experimental |
| 225 |
arvind207kumar/Time-Cross-Adaptive-Self-Attention-TCSA-based-Imputation-model-
Time-Cross Adaptive Self-Attention (TCSA) model for multivariate Time... |
|
Experimental |
| 226 |
kanenorman/grassmann
Attempt at reproducing "Attention Is Not What You Need: Grassmann Flows as... |
|
Experimental |
| 227 |
richengguy/calc.ai
Transformer-based Calculator |
|
Experimental |
| 228 |
sarabesh/exploring-transformers
A typical repo, to contain code I am doing to learn transformers... |
|
Experimental |
| 229 |
Chamiln17/Transformer-From-Scratch
My implmentation of the transformer architecture described in the paper... |
|
Experimental |
| 230 |
rashi-bhansali/encoder-decoder-transformer-variants-from-scratch
PyTorch implementation of Transformer encoder and GPT-style decoder with... |
|
Experimental |
| 231 |
chaowei312/HyperGraph-Sparse-Attention
Sparse attention via hypergraph partitioning for efficient long-context transformers |
|
Experimental |
| 232 |
wildanjr19/transformers-from-scratch
Implementing Transformers from Attention is All You Need paper in scratch. |
|
Experimental |
| 233 |
sathishkumar67/Byte-Latent-Transformer
Implementation of Byte Latent Transformer |
|
Experimental |
| 234 |
benearnthof/SparseTransformers
Reproducing the Paper Generating Long Sequences with Sparse Transformers by... |
|
Experimental |
| 235 |
adityakamat24/triton-fast-mha
A high-performance kernel implementation of multi-head attention using... |
|
Experimental |
| 236 |
albertkjoller/transformer-redundancy
Code for the paper "How Redundant Is the Transformer Stack in Speech... |
|
Experimental |
| 237 |
isakovaad/fedcsis25
A machine learning project to predict chess puzzle difficulty ratings using... |
|
Experimental |
| 238 |
AnkitaMungalpara/Building-DeepSeek-From-Scratch
This repository shows how to build a DeepSeek language model from scratch... |
|
Experimental |
| 239 |
balamarimuthu/deep-learning-with-pytorch
This repository contains a minimal PyTorch-based Transformer model... |
|
Experimental |
| 240 |
Joe-Naz01/transformers
A deep learning project that implements and explains the fundamental... |
|
Experimental |
| 241 |
Projects-Developer/Transformer-Models-For-NLP-Applications
Includes Source Code, PPT, Synopsis, Report, Documents, Base Research Paper... |
|
Experimental |
| 242 |
samaraxmmar/transformer-explained
A hands-on guide to understanding and building Transformer models from... |
|
Experimental |
| 243 |
graphcore-research/flash-attention-ipu
Poplar implementation of FlashAttention for IPU |
|
Experimental |
| 244 |
gustavecortal/transformer
Slides from my NLP course on the transformer architecture |
|
Experimental |
| 245 |
kikirizki/transformer
Minimalistic PyTorch implementation of transformer |
|
Experimental |
| 246 |
BramVanroy/lt3-2019-transformer-trainer
Transformer trainer for variety of classification problems that has been... |
|
Experimental |
| 247 |
lmxx1234567/goofy-hydra
Goofy Hydra is a Transport Layer Link Aggregator based on Transformer |
|
Experimental |
| 248 |
ytgui/SPT-proto
This repo includes a Sparse Transformer implementation which utilizes PQ to... |
|
Experimental |
| 249 |
Dhyanam04/ByteFetcher
This is ByteFetcher |
|
Experimental |
| 250 |
dariush-bahrami/mytransformers
My implementation of transformers |
|
Experimental |
| 251 |
ander-db/Transformers-PytorchLightning
👋 This is my implementation of the Transformer architecture from scratch... |
|
Experimental |
| 252 |
Jourdelune/Transformer
My implementation of the transformer architecture from the paper "Attention... |
|
Experimental |
| 253 |
maxime7770/Transformers-Insights
Exploring how Transformers actually transform the data under the hood |
|
Experimental |
| 254 |
ariva00/GaussianAttention4Matching
Code for the models described in the paper Localized Gaussians as... |
|
Experimental |
| 255 |
kyegomez/open-text-embedding-ada-002
This repository presents a production-grade implementation of a... |
|
Experimental |
| 256 |
Ranjit2111/Transformer-NMT
A PyTorch implementation of the Transformer architecture from "Attention Is... |
|
Experimental |
| 257 |
AlperYildirim1/Attention-is-All-You-Need-Pytorch
A fully reproducible, high-performance PyTorch Colab implementation of the... |
|
Experimental |
| 258 |
hash-ir/transformer-lab
Hands-on implementation of transformer and related models |
|
Experimental |
| 259 |
pplkit/AllYouNeedIsAttention
An efficient and robust implementation of the seminal "Attention Is All You... |
|
Experimental |
| 260 |
fatou1526/Pytorch_Transformers
This repo contains codes concerning pytorch models from how to define the... |
|
Experimental |
| 261 |
girishdhegde/NLP
Implementation of Deep Learning based Language Models from scratch in PyTorch |
|
Experimental |
| 262 |
microcoder-py/attn-is-all-you-need
A TFX implementation of the paper on transformers, Attention is All You Need |
|
Experimental |
| 263 |
shahrukhx01/transformers-bisected
A repo containing all building blocks of transformer model for text... |
|
Experimental |
| 264 |
Ipvikukiepki-KQS/progressive-transformers
A neural network architecture for building conversational agents |
|
Experimental |
| 265 |
devrahulbanjara/Transformers-from-Scratch
A repository implementing Transformers from scratch using PyTorch, designed... |
|
Experimental |
| 266 |
JHansiduYapa/Transformer-Model-from-Scratch
Build a Transformer model from scratch using Pytorch, implementing key... |
|
Experimental |
| 267 |
NipunRathore/NLP-Transformers-from-Scratch
Pre-training a Transformer from scratch. |
|
Experimental |