Bias Measurement Evaluation NLP Tools

Tools and datasets for detecting, measuring, and quantifying bias in NLP models and language systems. Includes benchmarks, metrics, and evaluation methods for assessing fairness across different demographic groups and intersectional categories. Does NOT include general bias mitigation techniques, debiasing methods without evaluation focus, or application-specific bias detection (e.g., hate speech or toxic comment detection).

There are 37 bias measurement evaluation tools tracked. The highest-rated is dccuchile/wefe at 46/100 with 183 stars.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=bias-measurement-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	dccuchile/wefe WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework...	46	Emerging	183	Python
2	dreji18/Fairness-in-AI Detecting Bias and ensuring Fairness in AI solutions	36	Emerging	102	Jupyter Notebook
3	amazon-science/bold Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in...	35	Emerging	87	—
4	dhfbk/variationist Variationist: Exploring Multifaceted Variation and Bias in Written Language...	33	Emerging	10	Python
5	soarsmu/BiasFinder BiasFinder \| IEEE TSE \| Metamorphic Test Generation to Uncover Bias for...	30	Emerging	11	Jupyter Notebook
6	microsoft/SafeNLP Safety Score for Pre-Trained Language Models	26	Experimental	96	Python
7	grecosalvatore/nlpguard NLPGuard: A Framework for Mitigating the use of Protected Attributes in NLP	25	Experimental	5	Python
8	darenr/gender-bias Real-time Javascipt gender bias detector	25	Experimental	4	JavaScript
9	jasonshaoshun/SAL code for "Spectral Removal of Guarded Attribute Information"	22	Experimental	7	Jupyter Notebook
10	princeton-nlp/MABEL EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data"...	22	Experimental	38	Python
11	kidologi/AI_lForge 🤖 Detect and mitigate bias in machine learning with the AI_lForge toolkit,...	22	Experimental	—	Python
12	CAMeL-Lab/gender-rewriting-shared-task Evaluation code and data for the gender rewriting shared task	22	Experimental	1	Python
13	krangelie/bias-in-german-nlg Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies...	20	Experimental	16	Jupyter Notebook
14	feyzaakyurek/bbnli Bias Benchmark for Natural Language Inference. Code repo for the Findings of...	20	Experimental	15	Python
15	tinotavingeyi-droid/ubuntu-xai An open-source research platform for evaluating AI bias, fairness, and...	19	Experimental	—	TypeScript
16	candacelax/bias-in-vision-and-language Code for paper "Measuring Social Biases in Grounded Vision and Language Embeddings"	19	Experimental	9	Shell
17	erica-dessi/Modelli-linguistici-e-discriminazione-nascosta-il-bias-di-genere-nelle-professioni La presente tesi esplora il fenomeno del bias di genere nei Large Language...	19	Experimental	—	—
18	cs329yangzhong/WIKIBIAS Code and data for EMNLP2021 paper: WIKIBIAS: Detecting Multi-Span Subjective...	17	Experimental	4	Python
19	yipenglai/Wikipedia-Gender-Bias Measure gender bias in English Wikipedia biographies through text analysis in R	17	Experimental	4	R
20	sathvikn/word_embedding_bias Companion to my blog post: How Biases in Language get Perpetuated by Technology	16	Experimental	4	Jupyter Notebook
21	minnesotanlp/Quantifying-Annotation-Disagreement Official implementation of Wan et al's paper "Everyone's Voice Matters:...	15	Experimental	6	Jupyter Notebook
22	VSteinborn/s_jsd-multilingual-bias Code and data for the paper "An Information-Theoretic Approach and Dataset...	15	Experimental	5	Python
23	google-research-datasets/nlp-fairness-for-india Contains data resources to replicate results from the paper...	14	Experimental	12	—
24	iampeti/Thesis_Gender_Bias 📊 Investigate gender bias in clinical research through statistical analysis...	14	Experimental	—	R
25	PieTempesti98/biases_in_hiring_decisions Review of the most studied biases in the hiring process made by Pietro...	14	Experimental	1	Jupyter Notebook
26	groovychoons/GlobalBias The official repo for the GlobalBias dataset and associated paper: 'Who is...	13	Experimental	5	Jupyter Notebook
27	jasonshaoshun/AMSAL code for "Erasure of Unaligned Attributes from Neural Representations"	13	Experimental	7	Python
28	hyoungjo/lipstick-on-a-pig Debiasing methods on contextualised embeddings are ineffective - CS475	13	Experimental	—	Jupyter Notebook
29	martinsjaavik/llm-bias-norwegian Master thesis on subtler biases	12	Experimental	1	Python
30	feyzaakyurek/bias-textgen Code for the paper "Challenges in Measuring Bias in Open-Ended Language...	12	Experimental	4	Python
31	CAMeL-Lab/gender-rewriting Code, models, and data for "User-Centric Gender Rewriting". NAACL 2022.	12	Experimental	3	Python
32	venkatasg/interpersonal-bias Code and data for the paper ' How people talk about each other: Modeling...	11	Experimental	2	Jupyter Notebook
33	Ahmad-AlSubaie/CS499-DL-debaising Repository for research done into the methods used to debias ML models....	11	Experimental	2	Jupyter Notebook
34	B-VARUN-REDDY/FairwAI-Bias-Detection Submission for the FairwAI Hospitality Intern Challenge. This project...	11	Experimental	—	Python
35	asimokby/formality-bias-analysis This repo contains the annotations and other artifacts of the paper titled:...	10	Experimental	1	—
36	VSteinborn/politeness-attacks Code and data for the paper "Politeness Stereotypes and Attack Vectors:...	10	Experimental	1	Python
37	iamshnoo/soc_bias Reproduction for NAACL paper on Socially Aware Bias Measurements for Hindi	10	Experimental	1	Python