AnkitNayak-eth/EpsteinFiles-RAG
A RAG pipeline implementation built on the 'Epstein Files 20K' dataset from Hugging Face (Teyler).
Implements document cleaning, intelligent chunking, and MMR-based retrieval to process 2.5M+ lines into a searchable vector index using Chroma and Sentence Transformers. Serves grounded answers via FastAPI and Streamlit UI, powered by LLaMA 3.3 through Groq's inference API, ensuring responses cite only source documents without hallucination.
358 stars.
Stars
358
Forks
58
Language
Python
License
MIT
Category
Last pushed
Feb 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/AnkitNayak-eth/EpsteinFiles-RAG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
OpenBMB/UltraRAG
A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines
Quansight/ragna
RAG orchestration framework ⛵️
microsoft/rag-time
RAG Time: A 5-week Learning Journey to Mastering RAG
microsoft/rag-experiment-accelerator
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the...
deepset-ai/haystack-rag-app
An example of a RAG backend plus UI