AnkitNayak-eth/EpsteinFiles-RAG

A RAG pipeline implementation built on the 'Epstein Files 20K' dataset from Hugging Face (Teyler).

/ 100

Established

Implements document cleaning, intelligent chunking, and MMR-based retrieval to process 2.5M+ lines into a searchable vector index using Chroma and Sentence Transformers. Serves grounded answers via FastAPI and Streamlit UI, powered by LLaMA 3.3 through Groq's inference API, ensuring responses cite only source documents without hallucination.

358 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 11 / 25

Community 20 / 25

How are scores calculated?

Stars

358

Forks

Language

Python

License

MIT

Category

rag-starter-projects

Last pushed

Feb 14, 2026

Commits (30d)

GitHub

RAG Starter Projects · 101 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/AnkitNayak-eth/EpsteinFiles-RAG"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

OpenBMB/UltraRAG

A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

Quansight/ragna

RAG orchestration framework ⛵️

microsoft/rag-time

RAG Time: A 5-week Learning Journey to Mastering RAG

microsoft/rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the...

deepset-ai/haystack-rag-app

An example of a RAG backend plus UI

Explore RAG Tools

All categories Trending RAG directory Insights