x-zheng16/CALM

[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs

/ 100

Experimental

This tool helps AI safety researchers and ethics auditors automatically find problematic responses from large language models (LLMs) that they don't have direct access to. It takes a black-box LLM service as input and uncovers specific inputs that make the LLM generate undesirable, unsafe, or biased outputs, like toxic language or hallucinations about sensitive topics. The output is a set of problematic input-output pairs that highlight the model's vulnerabilities.

No commits in the last 6 months.

Use this if you need to systematically test a proprietary or API-based LLM for harmful, biased, or unsafe behaviors without needing to access its internal code or training data.

Not ideal if you are looking for a tool to fine-tune an LLM for specific tasks or to evaluate its general performance metrics like accuracy or fluency.

AI Safety LLM Auditing Responsible AI Bias Detection Content Moderation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper...

gnai-creator/aletheion-llm-v2

Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.

sandylaker/ib-edl

Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

MLD3/steerability

An open-source evaluation framework for measuring LLM steerability.

Explore Transformer Models

All categories Trending Transformer directory Insights