Llm Interpretability Explainability Transformer Models
There are 29 llm interpretability explainability models tracked. The highest-rated is MadryLab/context-cite at 48/100 with 325 stars and 341 monthly downloads.
Get all 29 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-interpretability-explainability&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
MadryLab/context-cite
Attribute (or cite) statements generated by LLMs back to in-context information. |
|
Emerging |
| 2 |
microsoft/augmented-interpretable-models
Interpretable and efficient predictors using pre-trained language models.... |
|
Emerging |
| 3 |
Trustworthy-ML-Lab/CB-LLMs
[ICLR 25] A novel framework for building intrinsically interpretable LLMs... |
|
Emerging |
| 4 |
poloclub/LLM-Attributor
LLM Attributor: Attribute LLM's Generated Text to Training Data |
|
Emerging |
| 5 |
nlpkeg/Know-MRI
This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A... |
|
Emerging |
| 6 |
UKPLab/5pils
Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!"... |
|
Emerging |
| 7 |
THUDM/LongCite
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA |
|
Emerging |
| 8 |
yueyu1030/AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as... |
|
Emerging |
| 9 |
hao-ai-lab/Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models |
|
Emerging |
| 10 |
leap-laboratories/PIZZA
An attribution library for LLMs |
|
Emerging |
| 11 |
msakarvadia/memorization
Localizing Memorized Sequences in Language Models |
|
Experimental |
| 12 |
AI4LIFE-GROUP/LLM_Explainer
Code for paper: Are Large Language Models Post Hoc Explainers? |
|
Experimental |
| 13 |
ntt-dkiku/route-explainer
The official implementation of "RouteExplainer: An Explanation Framework for... |
|
Experimental |
| 14 |
itsqyh/Awesome-LMMs-Mechanistic-Interpretability
A curated collection of resources focused on the Mechanistic... |
|
Experimental |
| 15 |
microsoft/MMLU-CF
A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025] |
|
Experimental |
| 16 |
parameterlab/apricot
Source code of "Calibrating Large Language Models Using Their Generations... |
|
Experimental |
| 17 |
yinzhangyue/SelfAware
Do Large Language Models Know What They Don’t Know? |
|
Experimental |
| 18 |
Trustworthy-ML-Lab/VLG-CBM
[NeurIPS 24] A new training and evaluation framework for learning... |
|
Experimental |
| 19 |
jwergieluk/revllm
RevLLM -- Reverse Engineering Tools for Large Language Models |
|
Experimental |
| 20 |
llm-misinformation/llm-misinformation
The dataset and code for the ICLR 2024 paper "Can LLM-Generated... |
|
Experimental |
| 21 |
Zhang-Yihao/Adversarial-Representation-Engineering
Official implementation repository for the paper Towards General Conceptual... |
|
Experimental |
| 22 |
salesforce/factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing... |
|
Experimental |
| 23 |
yyy01/PAC
The official implementation of the paper "Data Contamination Calibration for... |
|
Experimental |
| 24 |
gsarti/pecore
Materials for "Quantifying the Plausibility of Context Reliance in Neural... |
|
Experimental |
| 25 |
bgreenwell/statlingua
Explain Statistical Output with Large Language Models |
|
Experimental |
| 26 |
Trustworthy-ML-Lab/Describe-and-Dissect
[TMLR 25] An automated method for explaining complex neuron behaviors in... |
|
Experimental |
| 27 |
Human-Centric-Machine-Learning/counterfactual-llms
Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024. |
|
Experimental |
| 28 |
AikyamLab/llm-memorization
Understanding the memorization property of Large Language Models using Model... |
|
Experimental |
| 29 |
zhaochen0110/LMLM
Code and data for "Improving Temporal Generalization of Pre-trained Language... |
|
Experimental |