IAAR-Shanghai/CRUD_RAG

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

34
/ 100
Emerging

Evaluates RAG systems across four distinct tasks—Create, Read, Update, Delete—using 80,000+ Chinese news documents as a retrieval corpus and Milvus vector database for indexing. Implements multiple evaluation metrics including BLEU, ROUGE, BERTScore, and RAGQuestEval (which leverages GPT for question generation and answering). Supports flexible LLM integration through modular APIs for GPT models, locally-deployed instances, and remote endpoints, with configurable retrieval parameters and prompt templates optimized for different model scales.

362 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

362

Forks

28

Language

Python

License

Last pushed

May 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/IAAR-Shanghai/CRUD_RAG"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.