brucelyu17/SC-TC-Bench
[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus Traditional Chinese
This project helps evaluate how Large Language Models (LLMs) perform differently when interacting in Simplified versus Traditional Chinese. You provide prompts related to regional terms or names, and the system outputs an analysis of how various LLMs respond, highlighting potential biases. Researchers, language model developers, and fairness and ethics auditors would use this to understand cultural and regional disparities in LLM behavior.
Use this if you need to benchmark and understand biases in how LLMs process and respond to content in Simplified versus Traditional Chinese for tasks like term or name choice.
Not ideal if you are looking to fine-tune an LLM or want a general-purpose tool for multilingual content generation beyond bias analysis in Chinese variants.
Stars
4
Forks
—
Language
Python
License
—
Category
Last pushed
Nov 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/brucelyu17/SC-TC-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
google-deepmind/long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper...
gnai-creator/aletheion-llm-v2
Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.
sandylaker/ib-edl
Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)
MLD3/steerability
An open-source evaluation framework for measuring LLM steerability.