x-zheng16/CALM
[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs
This tool helps AI safety researchers and ethics auditors automatically find problematic responses from large language models (LLMs) that they don't have direct access to. It takes a black-box LLM service as input and uncovers specific inputs that make the LLM generate undesirable, unsafe, or biased outputs, like toxic language or hallucinations about sensitive topics. The output is a set of problematic input-output pairs that highlight the model's vulnerabilities.
No commits in the last 6 months.
Use this if you need to systematically test a proprietary or API-based LLM for harmful, biased, or unsafe behaviors without needing to access its internal code or training data.
Not ideal if you are looking for a tool to fine-tune an LLM for specific tasks or to evaluate its general performance metrics like accuracy or fluency.
Stars
5
Forks
2
Language
Python
License
—
Category
Last pushed
Mar 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/x-zheng16/CALM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper...
gnai-creator/aletheion-llm-v2
Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.
sandylaker/ib-edl
Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)
MLD3/steerability
An open-source evaluation framework for measuring LLM steerability.