lechmazur/confabulations

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

/ 100

Experimental

Evaluates RAG systems by measuring both confabulation rates (false answers to unanswerable questions) and non-response rates across 201 human-verified adversarial questions and 2,612 answerable questions, revealing performance tradeoffs that simple accuracy metrics miss. Combines confabulation and non-response scoring into weighted rankings, exposing models that avoid hallucinations by refusing to answer entirely. Tests at temperature 0 across major LLM providers without relying on model-based evaluation, which the author found introduces significant bias.

243 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

243

Forks

Language

HTML

License

—

Higher-rated alternatives

onestardao/WFGY

WFGY: open-source reasoning and debugging infrastructure for RAG and AI agents. Includes the...

KRLabsOrg/verbatim-rag

Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content...

iMoonLab/Hyper-RAG

"Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation"...

frmoretto/clarity-gate

Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a verification protocol for...

chensyCN/LogicRAG

Source code of LogicRAG at AAAI'26.

Explore RAG Tools

All categories Trending RAG directory Insights