Hallucination Detection RAG RAG Tools

Tools and systems specifically designed to detect, mitigate, verify, and prevent hallucinations in RAG pipelines through claim extraction, evidence retrieval, and factuality validation. Does NOT include general RAG quality monitoring, broader fact-checking systems outside RAG context, or hallucination research in non-RAG LLM applications.

There are 40 hallucination detection rag tools tracked. 3 score above 50 (established tier). The highest-rated is onestardao/WFGY at 64/100 with 1,620 stars. 1 of the top 10 are actively maintained.

Get all 40 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=hallucination-detection-rag&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	onestardao/WFGY WFGY: open-source reasoning and debugging infrastructure for RAG and AI...	64	Established	1,620	Jupyter Notebook
2	KRLabsOrg/verbatim-rag Hallucination-prevention RAG system with verbatim span extraction. Ensures...	56	Established	170	Python
3	iMoonLab/Hyper-RAG "Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven...	51	Established	251	Python
4	frmoretto/clarity-gate Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a...	44	Emerging	23	Python
5	anulum/director-ai Real-time LLM hallucination guardrail — NLI + RAG fact-checking with...	41	Emerging	2	Python
6	project-miracl/nomiracl NoMIRACL: A multilingual hallucination evaluation dataset to evaluate LLM...	38	Emerging	26	Python
7	chensyCN/LogicRAG Source code of LogicRAG at AAAI'26.	36	Emerging	180	Python
8	anlp-team/LTI_Neural_Navigator "Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case...	26	Experimental	45	HTML
9	Betswish/MIRAGE Easy-to-use MIRAGE code for faithful answer attribution in RAG applications....	26	Experimental	26	Python
10	rafay123321/embedding-hallucinations This repo shows how foundational model hallucinates and how we can fix such...	24	Experimental	—	Python
11	rungalileo/hallucination-index Initiative to evaluate and rank the most popular LLMs across common task...	24	Experimental	116	—
12	amitgambhir/rag-auditor Open source RAG evaluation platform — automatically score faithfulness,...	23	Experimental	1	Python
13	MukundaKatta/RAGGuard RAG hallucination detection — verify LLM responses are grounded in source...	22	Experimental	—	Python
14	aryan-bhadana/rag-debugger A production-style RAG debugger with hybrid retrieval, failure detection,...	22	Experimental	—	Python
15	TECHKNOWMAD-LABS/ground-truth Hallucination detection for RAG pipelines.	22	Experimental	—	Python
16	scasella/adaptive_rag_rlm A verifiers RLM environment for testing whether adaptive recursive search...	22	Experimental	—	Python
17	renataennes/rag-hallucination-detector RAG pipeline with bilingual EN/PT hallucination detection	22	Experimental	—	Jupyter Notebook
18	lechmazur/confabulations Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes...	22	Experimental	243	HTML
19	tarekmasryo/rag-qa-logs-and-corpus Multi-table RAG QA telemetry + decision-grade RAG Ops notebook for retrieval...	21	Experimental	2	Jupyter Notebook
20	metawake/raglint pytest-native quality checks for RAG systems. Catches hallucinated entities,...	20	Experimental	1	Python
21	onurcandonmezer/rag-quality-monitor RAG quality monitoring and assurance platform	19	Experimental	—	Python
22	PolarisLiu1/LAT Look As You Think: Unifying Reasoning and Visual Evidence Attribution for...	19	Experimental	8	Python
23	GreyCatVP/raft-canon Architectural canon for production-grade RAFT / RAG systems: evaluation,...	15	Experimental	—	—
24	kareem2002-k/clara-vs-rag-comparison 🔬 Compare CLaRa (latent compression) vs RAG (prompt stuffing) for document...	15	Experimental	—	Python
25	hemanthballa07/HALO-RAG Self-Verification Chains for Hallucination-Free Retrieval-Augmented...	15	Experimental	—	Python
26	bdeva1975/hallucinationbench Detect hallucinations in your RAG pipeline output — in two lines of Python.	15	Experimental	1	Python
27	Padraigobrien08/model-failure-lab Toolkit for discovering, classifying, and debugging failure modes in LLM and...	15	Experimental	1	Python
28	alp-oz/cautious-rag A RAG system that knows when not to answer using concentration inequalities	14	Experimental	—	Python
29	nickhuang99/Intent-Aware-RAG Why Pure Vector Search is a "False Proposition" for RAG?	14	Experimental	3	—
30	yuvaraj949/Dynamic-Uncertainty-Aware-Attribution-RAG Token-level hallucination detection for RAG systems using Contextual...	14	Experimental	—	Python
31	samuel-isr/VeritasRAG A hallucination-resistant Retrieval-Augmented Generation (RAG) system.	14	Experimental	—	Python
32	usal-research/rag_ctxdq Implementation prototype for and executable context-aware data quality assessment	13	Experimental	2	Python
33	Kanisha-Shah/Hallucination-Mitigation-Using-RAG A Columbia University capstone project focused on mitigating hallucinations...	12	Experimental	3	—
34	emory-irlab/conqret-rag Controversial Questions for Argumentation and Retrieval	12	Experimental	4	Python
35	Sakshi3027/rag-handbook-qa A production-ready RAG system with citations and hallucination prevention	12	Experimental	1	Python
36	qualigenai/rag-learning Production-ready RAG system with evaluation framework — zero hallucination,...	11	Experimental	—	Python
37	Arnav-Ajay/rag-failure-modes Failure-first analysis of retrieval-augmented and agentic systems, focused...	11	Experimental	—	Python
38	Arnav-Ajay/rag-systems-foundations A systems-level analysis of static RAG pipelines, isolating ingestion,...	11	Experimental	—	—
39	khaledahmed-Tech/rag-patterns-in-production RAG reliability patterns: failure modes, observability, and quality loops.	11	Experimental	—	—
40	apatni24/VisionQA Context-aware tool for automated BDD test generation and execution using...	11	Experimental	—	Python

Comparisons in this category

Hyper-RAG and LTI_Neural_Navigator (51 vs 26) Hyper-RAG and RAGGuard (51 vs 22) verbatim-rag and RAGGuard (56 vs 22)