Llm Interpretability Explainability Transformer Models

There are 29 llm interpretability explainability models tracked. The highest-rated is MadryLab/context-cite at 48/100 with 325 stars and 341 monthly downloads.

Get all 29 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-interpretability-explainability&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	MadryLab/context-cite Attribute (or cite) statements generated by LLMs back to in-context information.	48	Emerging	325	Jupyter Notebook
2	microsoft/augmented-interpretable-models Interpretable and efficient predictors using pre-trained language models....	41	Emerging	44	Jupyter Notebook
3	Trustworthy-ML-Lab/CB-LLMs [ICLR 25] A novel framework for building intrinsically interpretable LLMs...	37	Emerging	31	Python
4	poloclub/LLM-Attributor LLM Attributor: Attribute LLM's Generated Text to Training Data	34	Emerging	76	Jupyter Notebook
5	nlpkeg/Know-MRI This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A...	34	Emerging	14	Jupyter Notebook
6	UKPLab/5pils Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!"...	32	Emerging	45	Python
7	THUDM/LongCite LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA	32	Emerging	519	Python
8	yueyu1030/AttrPrompt [NeurIPS 2023] This is the code for the paper `Large Language Model as...	31	Emerging	156	Python
9	hao-ai-lab/Consistency_LLM [ICML 2024] CLLMs: Consistency Large Language Models	31	Emerging	413	Python
10	leap-laboratories/PIZZA An attribution library for LLMs	30	Emerging	46	Python
11	msakarvadia/memorization Localizing Memorized Sequences in Language Models	29	Experimental	20	Jupyter Notebook
12	AI4LIFE-GROUP/LLM_Explainer Code for paper: Are Large Language Models Post Hoc Explainers?	28	Experimental	34	Jupyter Notebook
13	ntt-dkiku/route-explainer The official implementation of "RouteExplainer: An Explanation Framework for...	28	Experimental	17	Python
14	itsqyh/Awesome-LMMs-Mechanistic-Interpretability A curated collection of resources focused on the Mechanistic...	27	Experimental	192	—
15	microsoft/MMLU-CF A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]	26	Experimental	123	—
16	parameterlab/apricot Source code of "Calibrating Large Language Models Using Their Generations...	26	Experimental	22	Jupyter Notebook
17	yinzhangyue/SelfAware Do Large Language Models Know What They Don’t Know?	25	Experimental	102	Python
18	Trustworthy-ML-Lab/VLG-CBM [NeurIPS 24] A new training and evaluation framework for learning...	24	Experimental	29	Jupyter Notebook
19	jwergieluk/revllm RevLLM -- Reverse Engineering Tools for Large Language Models	24	Experimental	18	Python
20	llm-misinformation/llm-misinformation The dataset and code for the ICLR 2024 paper "Can LLM-Generated...	23	Experimental	81	Shell
21	Zhang-Yihao/Adversarial-Representation-Engineering Official implementation repository for the paper Towards General Conceptual...	23	Experimental	19	Python
22	salesforce/factualNLG Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing...	23	Experimental	61	Jupyter Notebook
23	yyy01/PAC The official implementation of the paper "Data Contamination Calibration for...	20	Experimental	16	Python
24	gsarti/pecore Materials for "Quantifying the Plausibility of Context Reliance in Neural...	20	Experimental	15	Jupyter Notebook
25	bgreenwell/statlingua Explain Statistical Output with Large Language Models	19	Experimental	10	R
26	Trustworthy-ML-Lab/Describe-and-Dissect [TMLR 25] An automated method for explaining complex neuron behaviors in...	19	Experimental	10	Jupyter Notebook
27	Human-Centric-Machine-Learning/counterfactual-llms Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.	17	Experimental	32	Jupyter Notebook
28	AikyamLab/llm-memorization Understanding the memorization property of Large Language Models using Model...	14	Experimental	9	Python
29	zhaochen0110/LMLM Code and data for "Improving Temporal Generalization of Pre-trained Language...	12	Experimental	18	Python