Transformer Interpretability Mechanistic Transformer Models
Tools for understanding transformer internals through visualization, attribution analysis, and mechanistic reverse-engineering of learned circuits and representations. Does NOT include general explainability frameworks, dataset analysis tools, or applications built on transformers.
There are 57 transformer interpretability mechanistic models tracked. 3 score above 50 (established tier). The highest-rated is inseq-team/inseq at 67/100 with 462 stars and 739 monthly downloads.
Get all 57 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-interpretability-mechanistic&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
inseq-team/inseq
Interpretability for sequence generation models π π |
|
Established |
| 2 |
jessevig/bertviz
BertViz: Visualize Attention in Transformer Models |
|
Established |
| 3 |
EleutherAI/knowledge-neurons
A library for finding knowledge neurons in pretrained transformer models. |
|
Established |
| 4 |
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic... |
|
Emerging |
| 5 |
cdpierse/transformers-interpret
Model explainability that works seamlessly with π€ transformers. Explain your... |
|
Emerging |
| 6 |
taufeeque9/codebook-features
Sparse and discrete interpretability tool for neural networks |
|
Emerging |
| 7 |
icon-lab/BolT
Fused Window Transformers for fMRI Time Series Analysis... |
|
Emerging |
| 8 |
DFKI-NLP/thermostat
Collection of NLP model explanations and accompanying analysis tools |
|
Emerging |
| 9 |
tongnie/ImputeFormer
[KDD 2024] "ImputeFormer: Low Rankness-Induced Transformers for... |
|
Emerging |
| 10 |
xmed-lab/TAM
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs |
|
Emerging |
| 11 |
Sandipan99/IndMask
IndMask: Inductive Explanation for Multivariate Time Series Black-box Model |
|
Emerging |
| 12 |
bvanaken/visbert
VisBERT: Demo web app for "How Does BERT Answer Questions?" |
|
Experimental |
| 13 |
jakobtroidl/neuron-shape-reasoning
PyTorch Implementation of Global Neuron Shape Reasoning with Point Affinity... |
|
Experimental |
| 14 |
andreped/vit-explainer
π₯ Demonstrating Explainable AI with Vision Transformer in web app |
|
Experimental |
| 15 |
ApocryphalEditor/SRM-mapping-framework
A framework for mapping the internal geometry of transformer representations... |
|
Experimental |
| 16 |
gsarti/lcl23-xnlm-lab
Materials for the Lab "Explaining Neural Language Models from Internal... |
|
Experimental |
| 17 |
Lumi-node/model-garage
Open the hood on neural networks. Component-level model surgery, analysis,... |
|
Experimental |
| 18 |
munnabhaiiii981/llm-attention-visualizer
π Visualize attention patterns in transformer models to better understand... |
|
Experimental |
| 19 |
ayaka14732/TrAVis
TrAVis: Visualise BERT attention in your browser |
|
Experimental |
| 20 |
s4um1l/aya-cross-lingual-probe
Mechanistic interpretability of cross-lingual concept representations in... |
|
Experimental |
| 21 |
mims-harvard/TimeX
Time series explainability via self-supervised model behavior consistency |
|
Experimental |
| 22 |
ovshake/rat
Reverse Attention Tracer: A lightweight API to visualize which words... |
|
Experimental |
| 23 |
rashomon-gh/attention-visualiser
a module to visualise attention layer activations from transformer based... |
|
Experimental |
| 24 |
poppingtonic/transformer-visualization
Mechanistic Interpretability Tutorials, Results and research log as I learn... |
|
Experimental |
| 25 |
rubencart/LIIR-TextGraphs-14
Code for KU Leuven LIIR lab's submission to the TextGraphs-14 shared task on... |
|
Experimental |
| 26 |
designer-coderajay/logit-lens-explorer
Mechanistic interpretability tool visualizing GPT-2's layer-by-layer... |
|
Experimental |
| 27 |
khairulislam/Timeseries-Explained
Interpreting Deep Learning timeseries models using Local Interpretation methods |
|
Experimental |
| 28 |
skyline-GTRr32/OKI-TRACE
OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what... |
|
Experimental |
| 29 |
mytechnotalent/mechanistic_interpretability
Mechanistic Interpretability (MI) is a subfield of AI alignment and safety... |
|
Experimental |
| 30 |
MaxwellCalkin/interpretability-toolkit
Practical mechanistic interpretability tools β activation caching, linear... |
|
Experimental |
| 31 |
JihoonJeong/Neural-MRI
Model Resonance Imaging β visualize LLM internals like a brain MRI |
|
Experimental |
| 32 |
Benjoyo/next-token-visualization
π§ Visualize token-by-token sampling with chat templates, nucleus filtering,... |
|
Experimental |
| 33 |
Alvoradozerouno/ORION-MIT-Interpretability-Bridge
ORION MIT Interpretability Bridge β MIT research + consciousness... |
|
Experimental |
| 34 |
designer-coderajay/induction-head-detector
Mechanistic interpretability tool to detect induction heads in GPT-2 using... |
|
Experimental |
| 35 |
davor10105/relative-absolute-magnitude-propagation
Explain the outputs of your Vision Transformers, Residual Networks and... |
|
Experimental |
| 36 |
sandipan211/LoCATe-GAT
Official PyTorch implementation of the IEEE TETCI 2024 paper LoCATe-GAT |
|
Experimental |
| 37 |
tegridydev/mechamap
MechaMap - Toolkit for Mechanistic Interpretability (MI) Research |
|
Experimental |
| 38 |
luckyspaceOK/llm-attention-visualizer
π Visualize attention patterns in transformer models to better understand... |
|
Experimental |
| 39 |
sinaabbasi1/NormXLogit
The official repo for the EMNLP 2025 paper "NormXLogit: The Head-on-Top Never Lies" |
|
Experimental |
| 40 |
erfanashams/steve
Speech Self-Attention Exploratory Visual Environment |
|
Experimental |
| 41 |
DFKI-NLP/SMV
Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map... |
|
Experimental |
| 42 |
zzak00/nlp_with_transformers_visualizations
Visualize NLP |
|
Experimental |
| 43 |
Shravani018/interpreting-transformer-hallucinations
Mechanistic interpretability of transformer hallucinations via attention... |
|
Experimental |
| 44 |
garimamittal13/csai_S26
Neuroimaging preprocessing, brain decoding, and visual brain encoding using... |
|
Experimental |
| 45 |
fracapuano/brainformer
A transformer-based approach to predicting MEG readings from EEG sensory... |
|
Experimental |
| 46 |
amrohendawi/unraveling-bert-article
In this article, the factors affecting BERT's transferability is explained... |
|
Experimental |
| 47 |
chizkidd/bert-masked-attention-visualizer
Visualizing and analyzing BERT self-attention heads during masked language modeling. |
|
Experimental |
| 48 |
dedely/XAI4EO
Towards Explainable AI4EO: an explainable DL approach for crop type mapping... |
|
Experimental |
| 49 |
Krasnomakov/openMaze_XAI
Explainable AI, attention visualization in LLM |
|
Experimental |
| 50 |
rey-reypixel/NeuroWeave
A client-side simulation of NLP Transformer models. Visualizes... |
|
Experimental |
| 51 |
Param-Uttarwar/neural-network-visualizer
Easy-to-use UI based tool that visualizes the internal layers and... |
|
Experimental |
| 52 |
HillaryDanan/relativistic-interpretability
A geometric framework for understanding neural network reasoning through... |
|
Experimental |
| 53 |
jha-lab/dini
[Nature-SR'22] DINI: Data Imputation using Neural Inversion |
|
Experimental |
| 54 |
jacoboromerodiaz/context-mixing-audio-text
Attribution framework for analyzing audioβtext context mixing in... |
|
Experimental |
| 55 |
gszfwsb/AutoGnothi
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering... |
|
Experimental |
| 56 |
VDuchauffour/transformers-visualizer
Explain your π€ transformers without effort! Plot the internal behavior of your model. |
|
Experimental |
| 57 |
alejoacelas/bayesian-transformers
Interpretability on 1-layer Transformer models that converge on the... |
|
Experimental |