lechmazur/confabulations
Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
Evaluates RAG systems by measuring both confabulation rates (false answers to unanswerable questions) and non-response rates across 201 human-verified adversarial questions and 2,612 answerable questions, revealing performance tradeoffs that simple accuracy metrics miss. Combines confabulation and non-response scoring into weighted rankings, exposing models that avoid hallucinations by refusing to answer entirely. Tests at temperature 0 across major LLM providers without relying on model-based evaluation, which the author found introduces significant bias.
243 stars. No commits in the last 6 months.
Stars
243
Forks
9
Language
HTML
License
—
Category
Last pushed
Aug 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/lechmazur/confabulations"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
onestardao/WFGY
WFGY: open-source reasoning and debugging infrastructure for RAG and AI agents. Includes the...
KRLabsOrg/verbatim-rag
Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content...
iMoonLab/Hyper-RAG
"Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation"...
frmoretto/clarity-gate
Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a verification protocol for...
chensyCN/LogicRAG
Source code of LogicRAG at AAAI'26.