LLM Bias Evaluation LLM Tools

Tools and frameworks for detecting, measuring, and auditing biases in large language models across domains like mental health, hiring, news, and stereotypes. Includes bias benchmarks, evaluation metrics, and mitigation techniques. Does NOT include general fairness frameworks, bias in other ML models, or non-LLM applications.

There are 33 llm bias evaluation tools tracked. 1 score above 50 (established tier). The highest-rated is cvs-health/langfair at 63/100 with 255 stars and 661 monthly downloads.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-bias-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and...

63
Established
2 gnai-creator/aletheion-llm-v2

Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.

36
Emerging
3 bws82/biasclear

Structural bias detection and correction engine built on Persistent...

35
Emerging
4 BetterForAll/HonestyMeter

HonestyMeter: An NLP-powered framework for evaluating objectivity and bias...

30
Emerging
5 h-stefanidis/xc3-bias-mitigation-llm

Determining bias in LLMs with Jupyter notebooks and Python scripts. Includes...

28
Experimental
6 MLD3/steerability

An open-source evaluation framework for measuring LLM steerability.

26
Experimental
7 kazemihabib/Mitigating-Reasoning-LLM-Social-Bias

A novel approach to mitigating social bias in Large Language Models through...

26
Experimental
8 KID-22/LLM-IR-Bias-Fairness-Survey

This is the repo for the survey of Bias and Fairness in IR with LLMs.

26
Experimental
9 Hanpx20/SafeSwitch

Official code repository for the paper "Internal Activation as the Polar...

23
Experimental
10 chandar-lab/CAIRO

We explain why fairness metrics don't correlate and propose CAIRO to make...

23
Experimental
11 neha13rana/Stereotypical-Bias-Analyzer

In this project, we analyzed biases in ten domains using four datasets and...

23
Experimental
12 faiyazabdullah/TranslationTangles

Uncovering Performance Gaps and Bias Patterns in LLM-Based Translations...

22
Experimental
13 UltraDeep-Tech/lcb-bench

LLM Cognitive Bias Benchmark: 1,500 test cases measuring 30 cognitive biases...

22
Experimental
14 fabthebest/EIC_Framework_Calibration

LLM decision-calibration engine based on Shannon Entropy and semantic...

19
Experimental
15 xingbpshen/medical-calibration-fairness-mllm

[MICCAI 2025] The official implementation of the paper "Exposing and...

19
Experimental
16 x-zheng16/CALM

[AAAI 25] CALM: Curiosity-Driven Auditing for LLMs

18
Experimental
17 minnesotanlp/cobbler

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases...

15
Experimental
18 zhuohaoyu/KIEval

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large...

14
Experimental
19 HIIAYUSHI/LLM-analytical-agent

Self-Correcting LLM Analytical Agent for SQL reasoning, statistical...

14
Experimental
20 gopi703/cultural-advice-bias

🌍 Visualize cultural bias in AI therapy advice, revealing how local...

14
Experimental
21 mtichikawa/llm-bias-detection

Research project detecting and quantifying demographic bias in language models

14
Experimental
22 jwmke/BiasCompass

Using LLMs to detect bias in news articles.

13
Experimental
23 joaoaleite/PASTEL

PASTEL (Prompted weAk Supervision wiTh crEdibility signaLs) is a weakly...

12
Experimental
24 grecosalvatore/StereoBusters-GSI-Detect-Evalita2026

This repository contains the code of the team StereoBusters for the Evalita...

12
Experimental
25 AndrewHeller17/Effect-of-Emotional-Framing-on-LLM-Performance

Evaluated the impact of emotional prompt framing on LLM reasoning accuracy...

11
Experimental
26 Pikeras72/EQUITIA

Tool for the automatic assessment of biases in LLM models

11
Experimental
27 d-lab/ecir26-qd-dense-vector-llm-rel-jud-bias-analysis

Code and experiments for Query–Document Dense Vectors for LLM Relevance...

11
Experimental
28 luka-group/Causal-View-of-Entity-Bias

[EMNLP 2023] A Causal View of Entity Bias in (Large) Language Models

11
Experimental
29 datos-Fundar/sesgos_LLM

¿Cómo “se equivocan” los modelos LLM?

11
Experimental
30 Trust4AI/GUARD-ME

AI-guided Evaluator for Bias Detection using Metamorphic Testing

11
Experimental
31 tddschn/llm-biases

LLM Biases Research

10
Experimental
32 Robert-Morabito/STOP

Repository for the paper STOP! Benchmarking Large Language Models with...

10
Experimental
33 brucelyu17/SC-TC-Bench

[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus...

10
Experimental