alex-is-busy-coding/speculative-rag
An implementation of Speculative RAG exploring latency-quality trade-offs in multi-draft retrieval. Features batched parallel drafting via vLLM and log-probability verifier selection for fast, high-quality QA on a single A100 GPU.
Stars
—
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/alex-is-busy-coding/speculative-rag"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
beir-cellar/beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across...
superlinear-ai/raglite
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
HKUDS/LightRAG
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
illuin-tech/vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
HKUDS/RAG-Anything
"RAG-Anything: All-in-One RAG Framework"