thiswillbeyourgithub/wdoc

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, etc

73
/ 100
Verified

Implements a multi-stage RAG pipeline using LangChain and LiteLLM that combines cheap and expensive LLM calls for high-recall document retrieval, then hierarchically aggregates answers via semantic batching to produce sourced markdown output with exact document citations. Supports 15+ filetypes simultaneously (PDFs, EPUBs, Anki decks, audio, video, web pages) and exposes both a CLI, Python library, and Gradio web UI for flexible integration with any LLM provider or local models.

510 stars and 840 monthly downloads. Actively maintained with 4 commits in the last 30 days. Available on PyPI.

Maintenance 16 / 25
Adoption 17 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

510

Forks

37

Language

Python

License

AGPL-3.0

Last pushed

Mar 08, 2026

Monthly downloads

840

Commits (30d)

4

Dependencies

49

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/thiswillbeyourgithub/wdoc"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.