PDF Document Processing RAG Tools
Tools and systems for extracting, parsing, and retrieving information from PDF documents through OCR, layout analysis, and structured data conversion. Does NOT include general chatbots, multi-source document handling beyond PDFs, or chat interfaces built on top of processed PDFs.
There are 58 pdf document processing tools tracked. 2 score above 50 (established tier). The highest-rated is thiswillbeyourgithub/wdoc at 66/100 with 510 stars and 840 monthly downloads. 1 of the top 10 are actively maintained.
Get all 58 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=rag&subcategory=pdf-document-processing&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
thiswillbeyourgithub/wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider,... |
|
Established |
| 2 |
laxmimerit/RAGWire
Production-grade RAG toolkit — ingest PDFs, DOCX, XLSX into Qdrant with LLM... |
|
Established |
| 3 |
NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing... |
|
Emerging |
| 4 |
Arterning/DeepParseX
DeepParseX 是一个强大的多模态文档解析与知识管理平台,支持 PDF、Word、Excel、PPT、图片、视频、音频... |
|
Emerging |
| 5 |
David-Lolly/ViewRAG
图文并茂的 PDF RAG 系统:支持版式感知分块、图表深度理解与精准视觉溯源。 Multimodal PDF RAG: Features... |
|
Emerging |
| 6 |
3DCF-Labs/doc2dataset
3DCF / doc2dataset: token-efficient document layer with NumGuard numeric... |
|
Emerging |
| 7 |
preprocess-co/rag-document-viewer
RAG Document Viewer is an open-source library that generates high-fidelity... |
|
Emerging |
| 8 |
atpuxiner/docsloader
This is a documents loader. (文档解析加载器,rag文档解析,rag知识库构建) |
|
Emerging |
| 9 |
zzstoatzz/raggy
scraping and querying documents for LLMs |
|
Emerging |
| 10 |
ManiAm/RAG-Mail
RAG-Mail is a thread-aware email processing system that semantically indexes... |
|
Experimental |
| 11 |
e-kotov/rdocdump
rdocdump: Dump ‘R’ Package Source, Documentation, and Vignettes into One File |
|
Experimental |
| 12 |
salameaz/pdf-process-rag
A Python-based application that extracts and processes PDF content using a... |
|
Experimental |
| 13 |
MalayAgr/bookacle
bookacle is a RAPTOR-based RAG application to aid in understanding complex... |
|
Experimental |
| 14 |
antoninomariarizzo/rag
A Python library for Retrieval-Augmented Generation (RAG) that extracts text... |
|
Experimental |
| 15 |
MohammedNasserAhmed/RAGPost
RAGPost is an intelligent blog post generator that leverages... |
|
Experimental |
| 16 |
Nexialism-Friday/hwpx-toolkit
HWP/HWPX document processing toolkit — extraction, generation, vectorization... |
|
Experimental |
| 17 |
salim-lakhal/rag-document-pipeline
Production RAG pipeline: multi-format document extraction → intelligent... |
|
Experimental |
| 18 |
S0lkar/IntGathering-x-RAG--BlazingDocs
RAG-based tool for document batch querying. |
|
Experimental |
| 19 |
SStephanJX/Snowflake-RAG-System
Production-ready Snowflake RAG system with type-specific chunking |
|
Experimental |
| 20 |
natanhp/PythoRAG
PythoRAG is a simple, open-source project designed to facilitate... |
|
Experimental |
| 21 |
AKSHAYINDIA05/Document_Comparison_System
Implement a Retrieval Augmented Generation (RAG) with a user interface for... |
|
Experimental |
| 22 |
Besthope-Official/predoc
Preprocess document service for RAG (Retriveal Augumented Generation) |
|
Experimental |
| 23 |
iamarunbrahma/rag-ingest
RAG-Ingest: A tool for converting PDFs to markdown and indexing them for... |
|
Experimental |
| 24 |
ParthSareen/simple-rag
Too many docs? Quickly search over any PDF or Markdown documents |
|
Experimental |
| 25 |
yotaken/docuggez
Automatic project documentator |
|
Experimental |
| 26 |
FrostWillmott/FinDocBot
Modern RAG, designed for semantic search and question-answering over... |
|
Experimental |
| 27 |
juhaodong/large-file-translator
Extract the content while preserving the layout, images, and tables. Perform... |
|
Experimental |
| 28 |
liunian-Jay/MU-GOT
PDF Parsing Tool: GOT's vLLM acceleration implementation, MinerU for layout... |
|
Experimental |
| 29 |
JochiRaider/sievio
Sievio turns GitHub, local repos, and web PDFs into clean JSONL for LLM... |
|
Experimental |
| 30 |
este6an13/checks-ocr
Software that applies OCR + RAG to extract bank checks information |
|
Experimental |
| 31 |
lolbigtime/Folio
Zero-config Swifty RAG toolkit for iOS & macOS — PDF/text loaders, universal... |
|
Experimental |
| 32 |
silas-rickards/PDF-LLM-RAG
A RAG pipeline specialized for local pdfs. |
|
Experimental |
| 33 |
Vibhuarvind/Content-Engine-RAG-for-PDF
Content Engine is RAG system that analyzes and compares multiple PDF... |
|
Experimental |
| 34 |
slvg01/90.10d_RAG_OnTheFly
An app allowing to upload files (ppt, doc, pdf, zip) and RAG on their content |
|
Experimental |
| 35 |
A-Najjar/rag-factory
Modular RAG system with Factory Pattern - Load PDF/Word docs, configure... |
|
Experimental |
| 36 |
husaynirfan1/PullData
RAG with response in what you need. Output directly with supported format... |
|
Experimental |
| 37 |
solomonjie/rag-processor
RAG index pipeline, from raw data clean to index. each step communicate via... |
|
Experimental |
| 38 |
JuliaGenAI/DocsScraper.jl
Efficient RAG knowledge pack creator from online Julia documentation |
|
Experimental |
| 39 |
ashwyan/local-llm-pdf-analyzer
A local AI tool using Ollama (Llama 3) to analyze PDF documents and generate... |
|
Experimental |
| 40 |
Clearedge-AI/clearedge
Build a RAG preprocessing pipeline |
|
Experimental |
| 41 |
alrafiabdullah/doc_rag
Document RAG with HuggingFace Token |
|
Experimental |
| 42 |
yagmur-kurtbas/pdf-rag-pipeline
A RAG pipeline for PDF question answering using LangChain, ChromaDB and Groq... |
|
Experimental |
| 43 |
ahmad-albasha/DataForg
PDF to JSON pipeline with intelligent bilingual chunking (AR/EN) and a fully... |
|
Experimental |
| 44 |
ritheesh-dev/Local-PDF-RAG-System
Privacy-first local PDF RAG system using FAISS + Ollama — fully offline,... |
|
Experimental |
| 45 |
avocatt/ocr-rag-highlighted-viewer
OCR + RAG document viewer with highlighted search results |
|
Experimental |
| 46 |
will695672804/graphrag-engineering-pdfs
🔍 Extract entities and build knowledge graphs from large engineering PDFs,... |
|
Experimental |
| 47 |
fllin1/mawa
RAG workflow (Mistral OCR + Gemini) for complex regulatory PDFs.... |
|
Experimental |
| 48 |
zenmakhlouf/arabic-rag-pipeline
A single-file RAG pipeline for Arabic PDF lectures with two-stage retrieval,... |
|
Experimental |
| 49 |
shivkhurana/technical-docs-rag-pipeline
Enterprise-grade RAG (Retrieval Augmented Generation) pipeline using... |
|
Experimental |
| 50 |
andersborgabiro/RagQueryDocuments
RAG application that makes it easy to search in multiple documents |
|
Experimental |
| 51 |
malkhabir/EasyRag
EasyRag shows you how to embed and query table documents from your own local... |
|
Experimental |
| 52 |
julicq/PDF-RAG-Query
RAG model for PDF database |
|
Experimental |
| 53 |
Qinnovation123/papers
PDF embedding workflow |
|
Experimental |
| 54 |
bazilicum/pdf-query
This project processes and retrieves information from PDF file or PDF... |
|
Experimental |
| 55 |
adrianizmi/Simple-RAG
Minimalist RAG system built from scratch using Python, local embeddings, and... |
|
Experimental |
| 56 |
2dogsandanerd/rag_pdf_audit
Tool to compare pdf extraction methods |
|
Experimental |
| 57 |
nkarast/ask-my-pdf
A RAG application using local LLM to answer questions given a PDF. |
|
Experimental |
| 58 |
sfkunal/librarian
Librarian is a RAG-assisted LLM application that allows any user to query... |
|
Experimental |