LLM Interpretability & Explainability LLM Tools

Tools and frameworks for understanding, explaining, and visualizing how large language models make decisions through mechanistic analysis, post-hoc explanations, concept-based interpretability, and neuron-level attribution methods. Does NOT include general model evaluation, bias detection, hallucination mitigation, or knowledge editing.

There are 51 llm interpretability & explainability tools tracked. 1 score above 50 (established tier). The highest-rated is filipnaudot/llmSHAP at 51/100 with 16 stars and 114 monthly downloads.

Get all 51 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-interpretability-explainability&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	filipnaudot/llmSHAP llmSHAP: a multi-threaded explainability framework using Shapley values for...	51	Established	16	Python
2	microsoft/automated-brain-explanations Generating and validating natural-language explanations for the brain.	41	Emerging	63	Jupyter Notebook
3	CAS-SIAT-XinHai/CPsyCoun [ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and...	37	Emerging	218	Jupyter Notebook
4	wesg52/universal-neurons Universal Neurons in GPT2 Language Models	32	Emerging	30	Jupyter Notebook
5	ICTMCG/LLM-for-misinformation-research Paper list of misinformation research using (multi-modal) large language...	31	Emerging	321	—
6	phvv-me/frame-representation-hypothesis Official Repository for Frame Representation Hypothesis paper	29	Experimental	8	Jupyter Notebook
7	marcusm117/IdentityChain [ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large...	29	Experimental	10	Python
8	UKPLab/naacl2025-cove Code associated with the NAACL 2025 paper "COVE: COntext and VEracity...	28	Experimental	7	Python
9	shahriargolchin/DCQ The official repository for the paper entitled "Data Contamination Quiz: A...	28	Experimental	6	Python
10	songxiaoshuai/progco Official Implementation of "ProgCo: Program Helps Self-Correction of Large...	27	Experimental	5	Python
11	Wang-ML-Lab/interpretable-foundation-models [ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy...	24	Experimental	18	Python
12	OpenMOSS/Say-I-Dont-Know [ICML'2024] Can AI Assistants Know What They Don't Know?	23	Experimental	85	Python
13	plusnli/medical-knowledge-judgment Codes and data for paper "Fact or Guesswork? Evaluating Large Language...	23	Experimental	6	Python
14	GS-Uni-Heidelberg/Paper-TheMoralizationCorpus Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse...	23	Experimental	1	Jupyter Notebook
15	Jonny-English/learn-interpretability Bilingual Colab-first mechanistic interpretability course with paper...	22	Experimental	—	Jupyter Notebook
16	Joe-b-20/CoreVital Mechanistic interpretability toolkit for monitoring LLM internal health....	22	Experimental	—	Python
17	Faisalse/LLM-reproducibility-audit https://faisalse.github.io/LLM-reproducibility-audit/	22	Experimental	—	CSS
18	UKPLab/arxiv2025-misleading-visualizations Code and datasets accompanying the arXiv preprint: "Protecting multimodal...	22	Experimental	4	JavaScript
19	OSU-NLP-Group/AttrScore Code, datasets, models for the paper "Automatic Evaluation of Attribution by...	22	Experimental	56	Python
20	amazon-science/ContraCLM [ACL 2023] Code for ContraCLM: Contrastive Learning For Causal Language Model	22	Experimental	35	Python
21	LFhase/CausalCOAT [NeurIPS 2024] Discovery of the Hidden World with Large Language Models	20	Experimental	8	Jupyter Notebook
22	youzhaozhao/LLM-Heuristic-Graph-Coloring Exploring LLM-assisted design of graph coloring heuristics through ...	19	Experimental	—	Jupyter Notebook
23	Nearzero-S/Intuitive-MechInterp Helping Humans Understand Our Processing	19	Experimental	—	—
24	MozerWang/DEMO [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained...	19	Experimental	22	Python
25	kasia-kobalczyk/guess_llm Implementation of the probing models presented in the ICLR 2026 paper...	19	Experimental	—	Jupyter Notebook
26	AColonnaDistria/llm2sql-consistency-analysis LLM-to-SQL analysis tool designed to quantify non-determinism behavior of...	19	Experimental	—	Python
27	YuweiYin/SWI SWI: Speaking with Intent in Large Language Models	19	Experimental	6	Python
28	ekaterinasviridova4/Investigating_implicitness_in_user_generated_argumentative_text This repository contains the dataset and implementation details of the paper...	19	Experimental	—	Jupyter Notebook
29	stefdesabbata/geospatial-mechanistic-interpretability Geospatial Mechanistic Interpretability of Large Language Models	17	Experimental	18	Jupyter Notebook
30	Strong-AI-Lab/Explanation-Generation We introduce "ILearner-LLM" a framework that uses iterative enhancement with...	17	Experimental	2	Python
31	tbohne/saliency_kd Saliency map-guided knowledge discovery for subclass identification with...	16	Experimental	1	Jupyter Notebook
32	HamedBabaei/CoLLM CoLLM: Consistency of Large Language Models in Knowledge Engineering	16	Experimental	1	Python
33	12kimih/HiCUPID [ACL 2025] Exploring the Potential of LLMs as Personalized Assistants:...	16	Experimental	14	Python
34	DataScienceUIBK/llm-reranking-generalization-study How Good are LLM-based Rerankers? Accepted at EMNLP Findings 2025	16	Experimental	12	—
35	lindaCai1997/data-attribution Scalable Gradient-Based Attribution of LLM Behaviors	15	Experimental	5	Python
36	Trustworthy-ML-Lab/Efficient-LLM-automated-interpretability [NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs	15	Experimental	6	Python
37	armlynobinguar/LLM-XAI-Papers A curated collection of research papers on explainability and...	15	Experimental	—	Python
38	GovAIx/QualityModulation [Nature Communications] Linguistic features of AI mis/disinformation and the...	15	Experimental	—	Jupyter Notebook
39	Aniezka/xfact-fever Official repository of FEVER@ACL 2025 paper "When Scale Meets Diversity:...	15	Experimental	7	—
40	froge159/belief-project-sef Activation-Space Interventions for Causal Control of Belief Representations...	14	Experimental	—	Jupyter Notebook
41	jiangjiechen/uncommongen Resources for our ACL 2023 paper: "Say What You Mean! Large Language Models...	14	Experimental	9	Python
42	braingpt-lovelab/backwards Source code for	14	Experimental	4	Jupyter Notebook
43	emanuelemessina/broken-morals Moral copilot for high-stakes ethical decisions in business contexts	13	Experimental	—	TeX
44	psunlpgroup/VerbosityLLM This repository maintains dataset, predictions, and code for paper:...	13	Experimental	5	Python
45	k-randl/self-explaining_llms Official implementation of the papers "Evaluating the Reliability of...	12	Experimental	1	Jupyter Notebook
46	dennismstfc/building-the-soedermizer Building the Söd★mizer: AI-Driven Machine Translation for Gender-Sensitive...	12	Experimental	3	Python
47	DAMO-NLP-SG/LLM-argumentation [ACL2024] Exploring the Potential of Large Language Models in Computational...	12	Experimental	17	Python
48	gianniskalyvas/llm-posthoc-explainability A study on post-hoc explainability in LLMs using counterfactual...	11	Experimental	—	Jupyter Notebook
49	phvv-me/icip2025 Official Repository for Vision Language Model Interpretability with Concept...	11	Experimental	—	Jupyter Notebook
50	stvsever/aHFR_TokenSHAP This repository implements an adaptive, hierarchy-aware Shapley method for...	11	Experimental	—	Python
51	ShiningLab/CON2LM This repository is for the paper Word Surprisal Correlates with Sentential...	11	Experimental	—	Jupyter Notebook