x-zheng16/CALM

[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs

25
/ 100
Experimental

This tool helps AI safety researchers and ethics auditors automatically find problematic responses from large language models (LLMs) that they don't have direct access to. It takes a black-box LLM service as input and uncovers specific inputs that make the LLM generate undesirable, unsafe, or biased outputs, like toxic language or hallucinations about sensitive topics. The output is a set of problematic input-output pairs that highlight the model's vulnerabilities.

No commits in the last 6 months.

Use this if you need to systematically test a proprietary or API-based LLM for harmful, biased, or unsafe behaviors without needing to access its internal code or training data.

Not ideal if you are looking for a tool to fine-tune an LLM for specific tasks or to evaluate its general performance metrics like accuracy or fluency.

AI Safety LLM Auditing Responsible AI Bias Detection Content Moderation
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

5

Forks

2

Language

Python

License

Last pushed

Mar 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/x-zheng16/CALM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.