LLM Bias Evaluation LLM Tools
Tools and frameworks for detecting, measuring, and auditing biases in large language models across domains like mental health, hiring, news, and stereotypes. Includes bias benchmarks, evaluation metrics, and mitigation techniques. Does NOT include general fairness frameworks, bias in other ML models, or non-LLM applications.
There are 33 llm bias evaluation tools tracked. 1 score above 50 (established tier). The highest-rated is cvs-health/langfair at 63/100 with 255 stars and 661 monthly downloads.
Get all 33 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-bias-evaluation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
cvs-health/langfair
LangFair is a Python library for conducting use-case level LLM bias and... |
|
Established |
| 2 |
gnai-creator/aletheion-llm-v2
Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know. |
|
Emerging |
| 3 |
bws82/biasclear
Structural bias detection and correction engine built on Persistent... |
|
Emerging |
| 4 |
BetterForAll/HonestyMeter
HonestyMeter: An NLP-powered framework for evaluating objectivity and bias... |
|
Emerging |
| 5 |
h-stefanidis/xc3-bias-mitigation-llm
Determining bias in LLMs with Jupyter notebooks and Python scripts. Includes... |
|
Experimental |
| 6 |
MLD3/steerability
An open-source evaluation framework for measuring LLM steerability. |
|
Experimental |
| 7 |
kazemihabib/Mitigating-Reasoning-LLM-Social-Bias
A novel approach to mitigating social bias in Large Language Models through... |
|
Experimental |
| 8 |
KID-22/LLM-IR-Bias-Fairness-Survey
This is the repo for the survey of Bias and Fairness in IR with LLMs. |
|
Experimental |
| 9 |
Hanpx20/SafeSwitch
Official code repository for the paper "Internal Activation as the Polar... |
|
Experimental |
| 10 |
chandar-lab/CAIRO
We explain why fairness metrics don't correlate and propose CAIRO to make... |
|
Experimental |
| 11 |
neha13rana/Stereotypical-Bias-Analyzer
In this project, we analyzed biases in ten domains using four datasets and... |
|
Experimental |
| 12 |
faiyazabdullah/TranslationTangles
Uncovering Performance Gaps and Bias Patterns in LLM-Based Translations... |
|
Experimental |
| 13 |
UltraDeep-Tech/lcb-bench
LLM Cognitive Bias Benchmark: 1,500 test cases measuring 30 cognitive biases... |
|
Experimental |
| 14 |
fabthebest/EIC_Framework_Calibration
LLM decision-calibration engine based on Shannon Entropy and semantic... |
|
Experimental |
| 15 |
xingbpshen/medical-calibration-fairness-mllm
[MICCAI 2025] The official implementation of the paper "Exposing and... |
|
Experimental |
| 16 |
x-zheng16/CALM
[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs |
|
Experimental |
| 17 |
minnesotanlp/cobbler
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases... |
|
Experimental |
| 18 |
zhuohaoyu/KIEval
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large... |
|
Experimental |
| 19 |
HIIAYUSHI/LLM-analytical-agent
Self-Correcting LLM Analytical Agent for SQL reasoning, statistical... |
|
Experimental |
| 20 |
gopi703/cultural-advice-bias
🌍 Visualize cultural bias in AI therapy advice, revealing how local... |
|
Experimental |
| 21 |
mtichikawa/llm-bias-detection
Research project detecting and quantifying demographic bias in language models |
|
Experimental |
| 22 |
jwmke/BiasCompass
Using LLMs to detect bias in news articles. |
|
Experimental |
| 23 |
joaoaleite/PASTEL
PASTEL (Prompted weAk Supervision wiTh crEdibility signaLs) is a weakly... |
|
Experimental |
| 24 |
grecosalvatore/StereoBusters-GSI-Detect-Evalita2026
This repository contains the code of the team StereoBusters for the Evalita... |
|
Experimental |
| 25 |
AndrewHeller17/Effect-of-Emotional-Framing-on-LLM-Performance
Evaluated the impact of emotional prompt framing on LLM reasoning accuracy... |
|
Experimental |
| 26 |
Pikeras72/EQUITIA
Tool for the automatic assessment of biases in LLM models |
|
Experimental |
| 27 |
d-lab/ecir26-qd-dense-vector-llm-rel-jud-bias-analysis
Code and experiments for Query–Document Dense Vectors for LLM Relevance... |
|
Experimental |
| 28 |
luka-group/Causal-View-of-Entity-Bias
[EMNLP 2023] A Causal View of Entity Bias in (Large) Language Models |
|
Experimental |
| 29 |
datos-Fundar/sesgos_LLM
¿Cómo “se equivocan” los modelos LLM? |
|
Experimental |
| 30 |
Trust4AI/GUARD-ME
AI-guided Evaluator for Bias Detection using Metamorphic Testing |
|
Experimental |
| 31 |
tddschn/llm-biases
LLM Biases Research |
|
Experimental |
| 32 |
Robert-Morabito/STOP
Repository for the paper STOP! Benchmarking Large Language Models with... |
|
Experimental |
| 33 |
brucelyu17/SC-TC-Bench
[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus... |
|
Experimental |