hhhuang/CAG

Cache-Augmented Generation: A Simple, Efficient Alternative to RAG

/ 100

Emerging

Preloads knowledge documents into the model's KV-cache during initialization, enabling inference without real-time retrieval steps. Supports comparative evaluation against RAG pipelines using BM25 and OpenAI retrievers on SQuAD and HotpotQA datasets, with configurable context lengths and document counts to measure performance tradeoffs. Works with Hugging Face models (tested on Llama-3.1-8B-Instruct) and includes Docker support for reproducible experimentation.

1,471 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 22 / 25

How are scores calculated?

Stars

1,471

Forks

217

Language

Python

License

MIT

Higher-rated alternatives

ictnlp/FlexRAG

FlexRAG: A RAG Framework for Information Retrieval and Generation.

VectorInstitute/fed-rag

A framework for fine-tuning retrieval-augmented generation (RAG) systems.

NirDiamant/RAG_Techniques

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG)...

RUC-NLPIR/FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

gomate-community/TrustRAG

TrustRAG：The RAG Framework within Reliable input,Trusted output

Explore RAG Tools

All categories Trending RAG directory Insights