xmpuspus/kb-arena

Benchmark 7 retrieval strategies on your own docs — naive vector, contextual, QnA pairs, knowledge graph, RAPTOR, PageIndex, and hybrid. Find which KB architecture fits your data.

/ 100

Established

Implements 8 retrieval strategies (including BM25, knowledge graphs, and RAPTOR) that run in parallel with pluggable LLM backends (Anthropic, OpenAI, Ollama) and auto-generates multi-tier benchmark questions from your documents. Ships a bundled React dashboard with strategy Arena mode for blind A/B comparison, cost tracking per strategy, and CI/CD integration via `--fail-below` thresholds—designed specifically for architecture selection rather than pipeline evaluation.

Available on PyPI.

Maintenance 13 / 25

Adoption 10 / 25

Maturity 18 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Related tools

beir-cellar/beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across...

superlinear-ai/raglite

🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

HKUDS/LightRAG

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

illuin-tech/vidore-benchmark

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

HKUDS/RAG-Anything

"RAG-Anything: All-in-One RAG Framework"

Explore RAG Tools

All categories Trending RAG directory Insights