vectara/hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

/ 100

Established

Leverages Vectara's HHEM-2.3 hallucination evaluation model to score LLMs across a curated, non-public dataset of 7700+ documents spanning news, science, medicine, and legal domains with varying complexity (50–24K words). Rankings track hallucination rate, factual consistency, answer rate, and summary length, with results updated regularly as models evolve. Integrates with Hugging Face for interactive exploration and provides an open-source HHEM-2.1 variant for reproducible research.

3,122 stars. Actively maintained with 9 commits in the last 30 days.

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

3,122

Forks

Language

Python

License

Apache-2.0

Compare

hallucination-leaderboard and Awesome-LVLM-Hallucination

Related tools

PKU-YuanGroup/Hallucination-Attack

Attack to induce LLMs within hallucinations

Amirhosein-gh98/Gnosis

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

NishilBalar/Awesome-LVLM-Hallucination

up-to-date curated list of state-of-the-art Large vision language models hallucinations...

MemTensor/HaluMem

HaluMem is the first operation level hallucination evaluation benchmark tailored to agent memory systems.

intuit/sac3

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via...

Explore LLM Tools

All categories Trending LLM Tool directory Insights