LLM Interpretability & Explainability LLM Tools

Tools and frameworks for understanding, explaining, and visualizing how large language models make decisions through mechanistic analysis, post-hoc explanations, concept-based interpretability, and neuron-level attribution methods. Does NOT include general model evaluation, bias detection, hallucination mitigation, or knowledge editing.

There are 51 llm interpretability & explainability tools tracked. 1 score above 50 (established tier). The highest-rated is filipnaudot/llmSHAP at 51/100 with 16 stars and 114 monthly downloads.

Get all 51 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-interpretability-explainability&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 filipnaudot/llmSHAP

llmSHAP: a multi-threaded explainability framework using Shapley values for...

51
Established
2 microsoft/automated-brain-explanations

Generating and validating natural-language explanations for the brain.

41
Emerging
3 CAS-SIAT-XinHai/CPsyCoun

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and...

37
Emerging
4 wesg52/universal-neurons

Universal Neurons in GPT2 Language Models

32
Emerging
5 ICTMCG/LLM-for-misinformation-research

Paper list of misinformation research using (multi-modal) large language...

31
Emerging
6 phvv-me/frame-representation-hypothesis

Official Repository for Frame Representation Hypothesis paper

29
Experimental
7 marcusm117/IdentityChain

[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large...

29
Experimental
8 UKPLab/naacl2025-cove

Code associated with the NAACL 2025 paper "COVE: COntext and VEracity...

28
Experimental
9 shahriargolchin/DCQ

The official repository for the paper entitled "Data Contamination Quiz: A...

28
Experimental
10 songxiaoshuai/progco

Official Implementation of "ProgCo: Program Helps Self-Correction of Large...

27
Experimental
11 Wang-ML-Lab/interpretable-foundation-models

[ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy...

24
Experimental
12 OpenMOSS/Say-I-Dont-Know

[ICML'2024] Can AI Assistants Know What They Don't Know?

23
Experimental
13 plusnli/medical-knowledge-judgment

Codes and data for paper "Fact or Guesswork? Evaluating Large Language...

23
Experimental
14 GS-Uni-Heidelberg/Paper-TheMoralizationCorpus

Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse...

23
Experimental
15 Jonny-English/learn-interpretability

Bilingual Colab-first mechanistic interpretability course with paper...

22
Experimental
16 Joe-b-20/CoreVital

Mechanistic interpretability toolkit for monitoring LLM internal health....

22
Experimental
17 Faisalse/LLM-reproducibility-audit

https://faisalse.github.io/LLM-reproducibility-audit/

22
Experimental
18 UKPLab/arxiv2025-misleading-visualizations

Code and datasets accompanying the arXiv preprint: "Protecting multimodal...

22
Experimental
19 OSU-NLP-Group/AttrScore

Code, datasets, models for the paper "Automatic Evaluation of Attribution by...

22
Experimental
20 amazon-science/ContraCLM

[ACL 2023] Code for ContraCLM: Contrastive Learning For Causal Language Model

22
Experimental
21 LFhase/CausalCOAT

[NeurIPS 2024] Discovery of the Hidden World with Large Language Models

20
Experimental
22 youzhaozhao/LLM-Heuristic-Graph-Coloring

Exploring LLM-assisted design of graph coloring heuristics through ...

19
Experimental
23 Nearzero-S/Intuitive-MechInterp

Helping Humans Understand Our Processing

19
Experimental
24 MozerWang/DEMO

[ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained...

19
Experimental
25 kasia-kobalczyk/guess_llm

Implementation of the probing models presented in the ICLR 2026 paper...

19
Experimental
26 AColonnaDistria/llm2sql-consistency-analysis

LLM-to-SQL analysis tool designed to quantify non-determinism behavior of...

19
Experimental
27 YuweiYin/SWI

SWI: Speaking with Intent in Large Language Models

19
Experimental
28 ekaterinasviridova4/Investigating_implicitness_in_user_generated_argumentative_text

This repository contains the dataset and implementation details of the paper...

19
Experimental
29 stefdesabbata/geospatial-mechanistic-interpretability

Geospatial Mechanistic Interpretability of Large Language Models

17
Experimental
30 Strong-AI-Lab/Explanation-Generation

We introduce "ILearner-LLM" a framework that uses iterative enhancement with...

17
Experimental
31 tbohne/saliency_kd

Saliency map-guided knowledge discovery for subclass identification with...

16
Experimental
32 HamedBabaei/CoLLM

CoLLM: Consistency of Large Language Models in Knowledge Engineering

16
Experimental
33 12kimih/HiCUPID

[ACL 2025] Exploring the Potential of LLMs as Personalized Assistants:...

16
Experimental
34 DataScienceUIBK/llm-reranking-generalization-study

How Good are LLM-based Rerankers? Accepted at EMNLP Findings 2025

16
Experimental
35 lindaCai1997/data-attribution

Scalable Gradient-Based Attribution of LLM Behaviors

15
Experimental
36 Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability

[NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs

15
Experimental
37 armlynobinguar/LLM-XAI-Papers

A curated collection of research papers on explainability and...

15
Experimental
38 GovAIx/QualityModulation

[Nature Communications] Linguistic features of AI mis/disinformation and the...

15
Experimental
39 Aniezka/xfact-fever

Official repository of FEVER@ACL 2025 paper "When Scale Meets Diversity:...

15
Experimental
40 froge159/belief-project-sef

Activation-Space Interventions for Causal Control of Belief Representations...

14
Experimental
41 jiangjiechen/uncommongen

Resources for our ACL 2023 paper: "Say What You Mean! Large Language Models...

14
Experimental
42 braingpt-lovelab/backwards

Source code for

14
Experimental
43 emanuelemessina/broken-morals

Moral copilot for high-stakes ethical decisions in business contexts

13
Experimental
44 psunlpgroup/VerbosityLLM

This repository maintains dataset, predictions, and code for paper:...

13
Experimental
45 k-randl/self-explaining_llms

Official implementation of the papers "Evaluating the Reliability of...

12
Experimental
46 dennismstfc/building-the-soedermizer

Building the Söd★mizer: AI-Driven Machine Translation for Gender-Sensitive...

12
Experimental
47 DAMO-NLP-SG/LLM-argumentation

[ACL2024] Exploring the Potential of Large Language Models in Computational...

12
Experimental
48 gianniskalyvas/llm-posthoc-explainability

A study on post-hoc explainability in LLMs using counterfactual...

11
Experimental
49 phvv-me/icip2025

Official Repository for Vision Language Model Interpretability with Concept...

11
Experimental
50 stvsever/aHFR_TokenSHAP

This repository implements an adaptive, hierarchy-aware Shapley method for...

11
Experimental
51 ShiningLab/CON2LM

This repository is for the paper Word Surprisal Correlates with Sentential...

11
Experimental