Llm Bias Evaluation Transformer Models

There are 7 llm bias evaluation models tracked. The highest-rated is google-deepmind/long-form-factuality at 48/100 with 672 stars.

Get all 7 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-bias-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	google-deepmind/long-form-factuality Benchmarking long-form factuality in large language models. Original code...	48	Emerging	672	Python
2	sandylaker/ib-edl Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)	30	Emerging	17	Python
3	nightdessert/Retrieval_Head open-source code for paper: Retrieval Head Mechanistically Explains...	26	Experimental	236	Python
4	EternityYW/BiasEval-LLM-MentalHealth Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models	25	Experimental	12	Jupyter Notebook
5	aigc-apps/PertEval [NeurIPS '24 Spotlight] PertEval: Unveiling Real Knowledge Capacity of LLMs...	24	Experimental	14	Jupyter Notebook
6	bowen-upenn/llm_token_bias [EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet...	23	Experimental	26	Python
7	fannie1208/FactTest [ICML2025] "FactTest: Factuality Testing in Large Language Models with...	16	Experimental	9	Python