Transformer Interpretability Mechanistic Transformer Models

Tools for understanding transformer internals through visualization, attribution analysis, and mechanistic reverse-engineering of learned circuits and representations. Does NOT include general explainability frameworks, dataset analysis tools, or applications built on transformers.

There are 57 transformer interpretability mechanistic models tracked. 3 score above 50 (established tier). The highest-rated is inseq-team/inseq at 67/100 with 462 stars and 739 monthly downloads.

Get all 57 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-interpretability-mechanistic&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 inseq-team/inseq

Interpretability for sequence generation models πŸ› πŸ”

67
Established
2 jessevig/bertviz

BertViz: Visualize Attention in Transformer Models

65
Established
3 EleutherAI/knowledge-neurons

A library for finding knowledge neurons in pretrained transformer models.

50
Established
4 hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic...

46
Emerging
5 cdpierse/transformers-interpret

Model explainability that works seamlessly with πŸ€— transformers. Explain your...

44
Emerging
6 taufeeque9/codebook-features

Sparse and discrete interpretability tool for neural networks

42
Emerging
7 icon-lab/BolT

Fused Window Transformers for fMRI Time Series Analysis...

38
Emerging
8 DFKI-NLP/thermostat

Collection of NLP model explanations and accompanying analysis tools

36
Emerging
9 tongnie/ImputeFormer

[KDD 2024] "ImputeFormer: Low Rankness-Induced Transformers for...

31
Emerging
10 xmed-lab/TAM

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

31
Emerging
11 Sandipan99/IndMask

IndMask: Inductive Explanation for Multivariate Time Series Black-box Model

31
Emerging
12 bvanaken/visbert

VisBERT: Demo web app for "How Does BERT Answer Questions?"

29
Experimental
13 jakobtroidl/neuron-shape-reasoning

PyTorch Implementation of Global Neuron Shape Reasoning with Point Affinity...

27
Experimental
14 andreped/vit-explainer

πŸ”₯ Demonstrating Explainable AI with Vision Transformer in web app

26
Experimental
15 ApocryphalEditor/SRM-mapping-framework

A framework for mapping the internal geometry of transformer representations...

25
Experimental
16 gsarti/lcl23-xnlm-lab

Materials for the Lab "Explaining Neural Language Models from Internal...

25
Experimental
17 Lumi-node/model-garage

Open the hood on neural networks. Component-level model surgery, analysis,...

25
Experimental
18 munnabhaiiii981/llm-attention-visualizer

πŸ” Visualize attention patterns in transformer models to better understand...

24
Experimental
19 ayaka14732/TrAVis

TrAVis: Visualise BERT attention in your browser

24
Experimental
20 s4um1l/aya-cross-lingual-probe

Mechanistic interpretability of cross-lingual concept representations in...

23
Experimental
21 mims-harvard/TimeX

Time series explainability via self-supervised model behavior consistency

23
Experimental
22 ovshake/rat

Reverse Attention Tracer: A lightweight API to visualize which words...

22
Experimental
23 rashomon-gh/attention-visualiser

a module to visualise attention layer activations from transformer based...

22
Experimental
24 poppingtonic/transformer-visualization

Mechanistic Interpretability Tutorials, Results and research log as I learn...

22
Experimental
25 rubencart/LIIR-TextGraphs-14

Code for KU Leuven LIIR lab's submission to the TextGraphs-14 shared task on...

22
Experimental
26 designer-coderajay/logit-lens-explorer

Mechanistic interpretability tool visualizing GPT-2's layer-by-layer...

21
Experimental
27 khairulislam/Timeseries-Explained

Interpreting Deep Learning timeseries models using Local Interpretation methods

20
Experimental
28 skyline-GTRr32/OKI-TRACE

OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what...

20
Experimental
29 mytechnotalent/mechanistic_interpretability

Mechanistic Interpretability (MI) is a subfield of AI alignment and safety...

20
Experimental
30 MaxwellCalkin/interpretability-toolkit

Practical mechanistic interpretability tools β€” activation caching, linear...

19
Experimental
31 JihoonJeong/Neural-MRI

Model Resonance Imaging β€” visualize LLM internals like a brain MRI

19
Experimental
32 Benjoyo/next-token-visualization

🧠 Visualize token-by-token sampling with chat templates, nucleus filtering,...

19
Experimental
33 Alvoradozerouno/ORION-MIT-Interpretability-Bridge

ORION MIT Interpretability Bridge β€” MIT research + consciousness...

19
Experimental
34 designer-coderajay/induction-head-detector

Mechanistic interpretability tool to detect induction heads in GPT-2 using...

16
Experimental
35 davor10105/relative-absolute-magnitude-propagation

Explain the outputs of your Vision Transformers, Residual Networks and...

16
Experimental
36 sandipan211/LoCATe-GAT

Official PyTorch implementation of the IEEE TETCI 2024 paper LoCATe-GAT

15
Experimental
37 tegridydev/mechamap

MechaMap - Toolkit for Mechanistic Interpretability (MI) Research

15
Experimental
38 luckyspaceOK/llm-attention-visualizer

πŸ” Visualize attention patterns in transformer models to better understand...

15
Experimental
39 sinaabbasi1/NormXLogit

The official repo for the EMNLP 2025 paper "NormXLogit: The Head-on-Top Never Lies"

15
Experimental
40 erfanashams/steve

Speech Self-Attention Exploratory Visual Environment

14
Experimental
41 DFKI-NLP/SMV

Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map...

14
Experimental
42 zzak00/nlp_with_transformers_visualizations

Visualize NLP

14
Experimental
43 Shravani018/interpreting-transformer-hallucinations

Mechanistic interpretability of transformer hallucinations via attention...

14
Experimental
44 garimamittal13/csai_S26

Neuroimaging preprocessing, brain decoding, and visual brain encoding using...

14
Experimental
45 fracapuano/brainformer

A transformer-based approach to predicting MEG readings from EEG sensory...

13
Experimental
46 amrohendawi/unraveling-bert-article

In this article, the factors affecting BERT's transferability is explained...

12
Experimental
47 chizkidd/bert-masked-attention-visualizer

Visualizing and analyzing BERT self-attention heads during masked language modeling.

12
Experimental
48 dedely/XAI4EO

Towards Explainable AI4EO: an explainable DL approach for crop type mapping...

12
Experimental
49 Krasnomakov/openMaze_XAI

Explainable AI, attention visualization in LLM

11
Experimental
50 rey-reypixel/NeuroWeave

A client-side simulation of NLP Transformer models. Visualizes...

11
Experimental
51 Param-Uttarwar/neural-network-visualizer

Easy-to-use UI based tool that visualizes the internal layers and...

11
Experimental
52 HillaryDanan/relativistic-interpretability

A geometric framework for understanding neural network reasoning through...

11
Experimental
53 jha-lab/dini

[Nature-SR'22] DINI: Data Imputation using Neural Inversion

11
Experimental
54 jacoboromerodiaz/context-mixing-audio-text

Attribution framework for analyzing audio–text context mixing in...

11
Experimental
55 gszfwsb/AutoGnothi

Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering...

11
Experimental
56 VDuchauffour/transformers-visualizer

Explain your πŸ€— transformers without effort! Plot the internal behavior of your model.

10
Experimental
57 alejoacelas/bayesian-transformers

Interpretability on 1-layer Transformer models that converge on the...

10
Experimental