Llm Interpretability Explainability Transformer Models

There are 29 llm interpretability explainability models tracked. The highest-rated is MadryLab/context-cite at 48/100 with 325 stars and 341 monthly downloads.

Get all 29 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-interpretability-explainability&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

48
Emerging
2 microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models....

41
Emerging
3 Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs...

37
Emerging
4 poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

34
Emerging
5 nlpkeg/Know-MRI

This is an official code for the [ACL 2025 Demo] paper: Know-MRI: A...

34
Emerging
6 UKPLab/5pils

Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!"...

32
Emerging
7 THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

32
Emerging
8 yueyu1030/AttrPrompt

[NeurIPS 2023] This is the code for the paper `Large Language Model as...

31
Emerging
9 hao-ai-lab/Consistency_LLM

[ICML 2024] CLLMs: Consistency Large Language Models

31
Emerging
10 leap-laboratories/PIZZA

An attribution library for LLMs

30
Emerging
11 msakarvadia/memorization

Localizing Memorized Sequences in Language Models

29
Experimental
12 AI4LIFE-GROUP/LLM_Explainer

Code for paper: Are Large Language Models Post Hoc Explainers?

28
Experimental
13 ntt-dkiku/route-explainer

The official implementation of "RouteExplainer: An Explanation Framework for...

28
Experimental
14 itsqyh/Awesome-LMMs-Mechanistic-Interpretability

A curated collection of resources focused on the Mechanistic...

27
Experimental
15 microsoft/MMLU-CF

A Contamination-free Multi-task Language Understanding Benchmark [Official, ACL 2025]

26
Experimental
16 parameterlab/apricot

Source code of "Calibrating Large Language Models Using Their Generations...

26
Experimental
17 yinzhangyue/SelfAware

Do Large Language Models Know What They Don’t Know?

25
Experimental
18 Trustworthy-ML-Lab/VLG-CBM

[NeurIPS 24] A new training and evaluation framework for learning...

24
Experimental
19 jwergieluk/revllm

RevLLM -- Reverse Engineering Tools for Large Language Models

24
Experimental
20 llm-misinformation/llm-misinformation

The dataset and code for the ICLR 2024 paper "Can LLM-Generated...

23
Experimental
21 Zhang-Yihao/Adversarial-Representation-Engineering

Official implementation repository for the paper Towards General Conceptual...

23
Experimental
22 salesforce/factualNLG

Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing...

23
Experimental
23 yyy01/PAC

The official implementation of the paper "Data Contamination Calibration for...

20
Experimental
24 gsarti/pecore

Materials for "Quantifying the Plausibility of Context Reliance in Neural...

20
Experimental
25 bgreenwell/statlingua

Explain Statistical Output with Large Language Models

19
Experimental
26 Trustworthy-ML-Lab/Describe-and-Dissect

[TMLR 25] An automated method for explaining complex neuron behaviors in...

19
Experimental
27 Human-Centric-Machine-Learning/counterfactual-llms

Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.

17
Experimental
28 AikyamLab/llm-memorization

Understanding the memorization property of Large Language Models using Model...

14
Experimental
29 zhaochen0110/LMLM

Code and data for "Improving Temporal Generalization of Pre-trained Language...

12
Experimental