TheBuleGanteng/interpretability-prototyping
This project is an educational exploration of Large Language Model (LLM) interpretability techniques, specifically focusing on Sparse Autoencoders (SAEs) as demonstrated in Anthropic's research: Scaling Monosemanticity.
Stars
—
Forks
—
Language
Jupyter Notebook
License
MIT
Last pushed
Mar 18, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/TheBuleGanteng/interpretability-prototyping"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
obss/sahi
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
MAIF/shapash
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent...
SeldonIO/alibi
Algorithms for explaining machine learning models
understandable-machine-intelligence-lab/Quantus
Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations
interpretml/interpret
Fit interpretable models. Explain blackbox machine learning.