LLM Interpretability & Explainability LLM Tools
Tools and frameworks for understanding, explaining, and visualizing how large language models make decisions through mechanistic analysis, post-hoc explanations, concept-based interpretability, and neuron-level attribution methods. Does NOT include general model evaluation, bias detection, hallucination mitigation, or knowledge editing.
There are 51 llm interpretability & explainability tools tracked. 1 score above 50 (established tier). The highest-rated is filipnaudot/llmSHAP at 51/100 with 16 stars and 114 monthly downloads.
Get all 51 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-interpretability-explainability&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
filipnaudot/llmSHAP
llmSHAP: a multi-threaded explainability framework using Shapley values for... |
|
Established |
| 2 |
microsoft/automated-brain-explanations
Generating and validating natural-language explanations for the brain. |
|
Emerging |
| 3 |
CAS-SIAT-XinHai/CPsyCoun
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and... |
|
Emerging |
| 4 |
wesg52/universal-neurons
Universal Neurons in GPT2 Language Models |
|
Emerging |
| 5 |
ICTMCG/LLM-for-misinformation-research
Paper list of misinformation research using (multi-modal) large language... |
|
Emerging |
| 6 |
phvv-me/frame-representation-hypothesis
Official Repository for Frame Representation Hypothesis paper |
|
Experimental |
| 7 |
marcusm117/IdentityChain
[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large... |
|
Experimental |
| 8 |
UKPLab/naacl2025-cove
Code associated with the NAACL 2025 paper "COVE: COntext and VEracity... |
|
Experimental |
| 9 |
shahriargolchin/DCQ
The official repository for the paper entitled "Data Contamination Quiz: A... |
|
Experimental |
| 10 |
songxiaoshuai/progco
Official Implementation of "ProgCo: Program Helps Self-Correction of Large... |
|
Experimental |
| 11 |
Wang-ML-Lab/interpretable-foundation-models
[ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy... |
|
Experimental |
| 12 |
OpenMOSS/Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know? |
|
Experimental |
| 13 |
plusnli/medical-knowledge-judgment
Codes and data for paper "Fact or Guesswork? Evaluating Large Language... |
|
Experimental |
| 14 |
GS-Uni-Heidelberg/Paper-TheMoralizationCorpus
Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse... |
|
Experimental |
| 15 |
Jonny-English/learn-interpretability
Bilingual Colab-first mechanistic interpretability course with paper... |
|
Experimental |
| 16 |
Joe-b-20/CoreVital
Mechanistic interpretability toolkit for monitoring LLM internal health.... |
|
Experimental |
| 17 |
Faisalse/LLM-reproducibility-audit
https://faisalse.github.io/LLM-reproducibility-audit/ |
|
Experimental |
| 18 |
UKPLab/arxiv2025-misleading-visualizations
Code and datasets accompanying the arXiv preprint: "Protecting multimodal... |
|
Experimental |
| 19 |
OSU-NLP-Group/AttrScore
Code, datasets, models for the paper "Automatic Evaluation of Attribution by... |
|
Experimental |
| 20 |
amazon-science/ContraCLM
[ACL 2023] Code for ContraCLM: Contrastive Learning For Causal Language Model |
|
Experimental |
| 21 |
LFhase/CausalCOAT
[NeurIPS 2024] Discovery of the Hidden World with Large Language Models |
|
Experimental |
| 22 |
youzhaozhao/LLM-Heuristic-Graph-Coloring
Exploring LLM-assisted design of graph coloring heuristics through ... |
|
Experimental |
| 23 |
Nearzero-S/Intuitive-MechInterp
Helping Humans Understand Our Processing |
|
Experimental |
| 24 |
MozerWang/DEMO
[ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained... |
|
Experimental |
| 25 |
kasia-kobalczyk/guess_llm
Implementation of the probing models presented in the ICLR 2026 paper... |
|
Experimental |
| 26 |
AColonnaDistria/llm2sql-consistency-analysis
LLM-to-SQL analysis tool designed to quantify non-determinism behavior of... |
|
Experimental |
| 27 |
YuweiYin/SWI
SWI: Speaking with Intent in Large Language Models |
|
Experimental |
| 28 |
ekaterinasviridova4/Investigating_implicitness_in_user_generated_argumentative_text
This repository contains the dataset and implementation details of the paper... |
|
Experimental |
| 29 |
stefdesabbata/geospatial-mechanistic-interpretability
Geospatial Mechanistic Interpretability of Large Language Models |
|
Experimental |
| 30 |
Strong-AI-Lab/Explanation-Generation
We introduce "ILearner-LLM" a framework that uses iterative enhancement with... |
|
Experimental |
| 31 |
tbohne/saliency_kd
Saliency map-guided knowledge discovery for subclass identification with... |
|
Experimental |
| 32 |
HamedBabaei/CoLLM
CoLLM: Consistency of Large Language Models in Knowledge Engineering |
|
Experimental |
| 33 |
12kimih/HiCUPID
[ACL 2025] Exploring the Potential of LLMs as Personalized Assistants:... |
|
Experimental |
| 34 |
DataScienceUIBK/llm-reranking-generalization-study
How Good are LLM-based Rerankers? Accepted at EMNLP Findings 2025 |
|
Experimental |
| 35 |
lindaCai1997/data-attribution
Scalable Gradient-Based Attribution of LLM Behaviors |
|
Experimental |
| 36 |
Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability
[NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs |
|
Experimental |
| 37 |
armlynobinguar/LLM-XAI-Papers
A curated collection of research papers on explainability and... |
|
Experimental |
| 38 |
GovAIx/QualityModulation
[Nature Communications] Linguistic features of AI mis/disinformation and the... |
|
Experimental |
| 39 |
Aniezka/xfact-fever
Official repository of FEVER@ACL 2025 paper "When Scale Meets Diversity:... |
|
Experimental |
| 40 |
froge159/belief-project-sef
Activation-Space Interventions for Causal Control of Belief Representations... |
|
Experimental |
| 41 |
jiangjiechen/uncommongen
Resources for our ACL 2023 paper: "Say What You Mean! Large Language Models... |
|
Experimental |
| 42 |
braingpt-lovelab/backwards
Source code for |
|
Experimental |
| 43 |
emanuelemessina/broken-morals
Moral copilot for high-stakes ethical decisions in business contexts |
|
Experimental |
| 44 |
psunlpgroup/VerbosityLLM
This repository maintains dataset, predictions, and code for paper:... |
|
Experimental |
| 45 |
k-randl/self-explaining_llms
Official implementation of the papers "Evaluating the Reliability of... |
|
Experimental |
| 46 |
dennismstfc/building-the-soedermizer
Building the Söd★mizer: AI-Driven Machine Translation for Gender-Sensitive... |
|
Experimental |
| 47 |
DAMO-NLP-SG/LLM-argumentation
[ACL2024] Exploring the Potential of Large Language Models in Computational... |
|
Experimental |
| 48 |
gianniskalyvas/llm-posthoc-explainability
A study on post-hoc explainability in LLMs using counterfactual... |
|
Experimental |
| 49 |
phvv-me/icip2025
Official Repository for Vision Language Model Interpretability with Concept... |
|
Experimental |
| 50 |
stvsever/aHFR_TokenSHAP
This repository implements an adaptive, hierarchy-aware Shapley method for... |
|
Experimental |
| 51 |
ShiningLab/CON2LM
This repository is for the paper Word Surprisal Correlates with Sentential... |
|
Experimental |