brucelyu17/SC-TC-Bench

[FAccT '25] Characterizing Bias: Benchmarking LLMs in Simplified versus Traditional Chinese

/ 100

Experimental

This project helps evaluate how Large Language Models (LLMs) perform differently when interacting in Simplified versus Traditional Chinese. You provide prompts related to regional terms or names, and the system outputs an analysis of how various LLMs respond, highlighting potential biases. Researchers, language model developers, and fairness and ethics auditors would use this to understand cultural and regional disparities in LLM behavior.

Use this if you need to benchmark and understand biases in how LLMs process and respond to content in Simplified versus Traditional Chinese for tasks like term or name choice.

Not ideal if you are looking to fine-tune an LLM or want a general-purpose tool for multilingual content generation beyond bias analysis in Chinese variants.

AI-ethics LLM-benchmarking natural-language-processing cross-cultural-communication algorithmic-fairness

No License No Package No Dependents

Maintenance 6 / 25

Adoption 3 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

cvs-health/langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

google-deepmind/long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper...

gnai-creator/aletheion-llm-v2

Decoder-only LLM with integrated epistemic tomography. Knows what it doesn't know.

sandylaker/ib-edl

Calibrating LLMs with Information-Theoretic Evidential Deep Learning (ICLR 2025)

MLD3/steerability

An open-source evaluation framework for measuring LLM steerability.

Explore Transformer Models

All categories Trending Transformer directory Insights