Document Intelligence RAG Embedding Tools
Tools for uploading, searching, and conversationally querying documents (PDFs, files, etc.) using embeddings and semantic search to extract insights and answers. Does NOT include code documentation generation, code search, or cross-document fact-checking systems.
There are 52 document intelligence rag tools tracked. The highest-rated is maxent-ai/ocrpy at 42/100 with 224 stars and 167 monthly downloads.
Get all 52 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=document-intelligence-rag&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
maxent-ai/ocrpy
OCR, Archive, Index and Search: Implementation agnostic OCR framework. |
|
Emerging |
| 2 |
haven-jeon/LegalQA
Korean LegalQA using SentenceKoBART |
|
Emerging |
| 3 |
ametnes/nesis
Your AI Powered Enterprise Knowledge Partner. Designed to be used at scale... |
|
Emerging |
| 4 |
foxminchan/LawKnowledge
A legal knowledge search and Q&A application based on Vietnam's Legal Code... |
|
Emerging |
| 5 |
utachicodes/PyDocEnhancer
An AI-powered Python plugin to enhance documentation with summaries, code... |
|
Emerging |
| 6 |
ryanlane/document-manager
Local-first document archive assistant for semantic search and RAG using... |
|
Emerging |
| 7 |
intel/document-automation
Document Automation Reference Kit |
|
Emerging |
| 8 |
machinelearningZH/document-research-tool
Perform intelligent research over document collections using hybrid search and LLMs. |
|
Emerging |
| 9 |
joe32140/tei-qdrant-cache
Docker Compose stack for scalable TEI embeddings (multi-GPU) fronted by a... |
|
Experimental |
| 10 |
Schematise-Lex-Data-Analysis/lex-liberalis
A fork of Semantra for Indian court judgments |
|
Experimental |
| 11 |
rahulapjs/QueryDoc
QueryDoc is a full-stack RAG application that enables secure,... |
|
Experimental |
| 12 |
kchanda24/hackathon-backend
Enterprise Content Management MVP with semantic search capabilities. Upload... |
|
Experimental |
| 13 |
FellowTraveler/ngest
Python script for ingesting various files into a semantic graph. For text,... |
|
Experimental |
| 14 |
mcplusa/elastic-ingest-http
This is an Elasticsearch Ingest Pipeline Processor that calls an HTTP(s)... |
|
Experimental |
| 15 |
Leg0shii/smart-documents
A web application that enables users to upload documents and utilize AI... |
|
Experimental |
| 16 |
josego85/pdf-content-search
🔍 AI-powered PDF search with OCR support for scanned documents, local AI via... |
|
Experimental |
| 17 |
HarshilMaks/InsightDocs
AI Document Intelligence System for deep analysis and semantic querying of... |
|
Experimental |
| 18 |
moonlitrevery/DodocLens
Inteligência documental com IA local (OCR + busca semântica) para PDFs e... |
|
Experimental |
| 19 |
lookingforvirus/fastapi_auto_routes
⚡ Generate dynamic CRUD and Auth routes effortlessly with FastAPI Auto... |
|
Experimental |
| 20 |
gracee3/qdrant-bge-stack
Local deployment stack for Qdrant vector search with vLLM-served BAAI... |
|
Experimental |
| 21 |
arcnem-ai/arcnem-vision
An AI-powered platform for document ingestion, processing, and similarity search |
|
Experimental |
| 22 |
sapientpants/doctrans
Privacy-first document translation powered by local AI (Ollama). Upload PDF,... |
|
Experimental |
| 23 |
xhulianokoci/DocCompareAI
ASP.NET Core API for comparing Word documents with AI — text diff, OpenAI... |
|
Experimental |
| 24 |
danilagoleen/vetka-ingest-engine
Ingestion/indexing core for agent systems: scanning, extraction, dependency... |
|
Experimental |
| 25 |
VedantKothari01/DocInsight
AI-powered document originality and plagiarism risk detection system... |
|
Experimental |
| 26 |
mry0tt4/DocGenie
AI-powered documentation platform that automatically generates, categorizes,... |
|
Experimental |
| 27 |
Tonemon/StaxRead
Self-hosted semantic search over your own documents. Your own self-hosted... |
|
Experimental |
| 28 |
LeonKiptoo/document-intelligence-engine
A document intelligence system that enables semantic question answering over... |
|
Experimental |
| 29 |
ashankgupta/docai
DocAI is a Go-based toolkit that enables intelligent interaction with your... |
|
Experimental |
| 30 |
ivan-markov-666/rag-jag
Self-hosted AI-powered document search system. Upload PDF, DOCX, TXT, and... |
|
Experimental |
| 31 |
harshsrivastava05/Document-Analyzer
An AI-powered document analysis platform that transforms uploaded files into... |
|
Experimental |
| 32 |
HemalDholakiya12/PDFChat
A web app that allows users to upload PDFs and interact with them through a... |
|
Experimental |
| 33 |
ventz/pdf-semantic-keyword-analysis
High-performance PDF Semantic keyword analysis tool using AI for intelligent... |
|
Experimental |
| 34 |
akbar-ops/sistema-de-analisis-de-documentos-juridicos
đź“„ Analyze, classify, and search legal documents with advanced NLP techniques... |
|
Experimental |
| 35 |
cosmanBrenden/DocumentMuncher
DocumentMuncher is a locally running document seach engine that allows you... |
|
Experimental |
| 36 |
SSK-14/GitDoc-AI
GitDoc is your ultimate GitHub Documentation Explorer! It's your trusty... |
|
Experimental |
| 37 |
KaramelBytes/docloom-cli
AI‑augmented document analysis and lightweight retrieval (Go) with... |
|
Experimental |
| 38 |
KaavyaGala546/DocuMind-AI
DocuMind-AI is an AI-powered document assistant that allows users to upload... |
|
Experimental |
| 39 |
Irshad-11/PDF-INSIGHTS
Smart PDF Analyzer with OCR and Semantic Search |
|
Experimental |
| 40 |
mamoon-17/DocuQuery
DocuQuery — a minimal RAG demo: upload PDFs, generate local embeddings,... |
|
Experimental |
| 41 |
maharishiayurveda/DocQuify
Extract insights from research papers with DocQuify. Upload PDFs and ask... |
|
Experimental |
| 42 |
NathanMaine/rah-qdrant-integration
Community add-on for RA-H OS that replaces sqlite-vec with Qdrant for vector... |
|
Experimental |
| 43 |
Helixo613/docforensics
Cross-document contradiction and agreement detection for PDF collections... |
|
Experimental |
| 44 |
Naturestudyperinatologist466/fojin
Aggregate and search over 9,200 Buddhist texts in multiple languages with... |
|
Experimental |
| 45 |
JacobPolloreno/OfficeAnswers
Get to the real work by using neural information retrieval for company information. |
|
Experimental |
| 46 |
dyannadle/AI-Powered-Search-Over-Noation
An AI-powered document search engine that connects to Notion and Google... |
|
Experimental |
| 47 |
tstephx/book-ingestion-python
Book ingestion pipeline for processing PDF/EPUB into searchable chapters... |
|
Experimental |
| 48 |
devinitive-team/mirage
🏜️ Mirage: Universal, relevance search over PDF documents at any scale.... |
|
Experimental |
| 49 |
KishoreMuruganantham/HackRx-6.0-Intelligent-Query-Retrieval
LLM-powered system for intelligent query–retrieval from large documents in... |
|
Experimental |
| 50 |
gururaser/qdrant-data-processor
A high-performance data ingestion pipeline in Go for processing Amazon... |
|
Experimental |
| 51 |
kstv364/intellidoc
Hackathon project - Intellidoc - ECM MVP with semantic search capabilities.... |
|
Experimental |
| 52 |
bivex/qdrant_streamlit_generator_via_groq
🔍 QDRANT + STREAMLIT + GROQ = VECTOR SEARCH UI. Explore embeddings.... |
|
Experimental |