lechmazur/confabulations

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

29
/ 100
Experimental

Evaluates RAG systems by measuring both confabulation rates (false answers to unanswerable questions) and non-response rates across 201 human-verified adversarial questions and 2,612 answerable questions, revealing performance tradeoffs that simple accuracy metrics miss. Combines confabulation and non-response scoring into weighted rankings, exposing models that avoid hallucinations by refusing to answer entirely. Tests at temperature 0 across major LLM providers without relying on model-based evaluation, which the author found introduces significant bias.

243 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

243

Forks

9

Language

HTML

License

Last pushed

Aug 07, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/lechmazur/confabulations"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.