TheBuleGanteng/interpretability-prototyping

This project is an educational exploration of Large Language Model (LLM) interpretability techniques, specifically focusing on Sparse Autoencoders (SAEs) as demonstrated in Anthropic's research: Scaling Monosemanticity.

/ 100

Experimental

No Package No Dependents

Maintenance 13 / 25

Adoption 0 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

—

Forks

—

Language

Jupyter Notebook

License

MIT

Category

explainability-interpretability-frameworks

Last pushed

Mar 18, 2026

Commits (30d)

GitHub

Explainability Interpretability Frameworks · 223 frameworks

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/TheBuleGanteng/interpretability-prototyping"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

obss/sahi

Framework agnostic sliced/tiled inference + interactive ui + error analysis plots

MAIF/shapash

🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent...

SeldonIO/alibi

Algorithms for explaining machine learning models

understandable-machine-intelligence-lab/Quantus

Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations

interpretml/interpret

Fit interpretable models. Explain blackbox machine learning.

Explore ML Frameworks

All categories Trending ML Framework directory Insights