Document Intelligence RAG Embedding Tools

Tools for uploading, searching, and conversationally querying documents (PDFs, files, etc.) using embeddings and semantic search to extract insights and answers. Does NOT include code documentation generation, code search, or cross-document fact-checking systems.

There are 52 document intelligence rag tools tracked. The highest-rated is maxent-ai/ocrpy at 42/100 with 224 stars and 167 monthly downloads.

Get all 52 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=document-intelligence-rag&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 maxent-ai/ocrpy

OCR, Archive, Index and Search: Implementation agnostic OCR framework.

42
Emerging
2 haven-jeon/LegalQA

Korean LegalQA using SentenceKoBART

38
Emerging
3 ametnes/nesis

Your AI Powered Enterprise Knowledge Partner. Designed to be used at scale...

36
Emerging
4 foxminchan/LawKnowledge

A legal knowledge search and Q&A application based on Vietnam's Legal Code...

33
Emerging
5 utachicodes/PyDocEnhancer

An AI-powered Python plugin to enhance documentation with summaries, code...

33
Emerging
6 ryanlane/document-manager

Local-first document archive assistant for semantic search and RAG using...

33
Emerging
7 intel/document-automation

Document Automation Reference Kit

31
Emerging
8 machinelearningZH/document-research-tool

Perform intelligent research over document collections using hybrid search and LLMs.

31
Emerging
9 joe32140/tei-qdrant-cache

Docker Compose stack for scalable TEI embeddings (multi-GPU) fronted by a...

27
Experimental
10 Schematise-Lex-Data-Analysis/lex-liberalis

A fork of Semantra for Indian court judgments

27
Experimental
11 rahulapjs/QueryDoc

QueryDoc is a full-stack RAG application that enables secure,...

25
Experimental
12 kchanda24/hackathon-backend

Enterprise Content Management MVP with semantic search capabilities. Upload...

24
Experimental
13 FellowTraveler/ngest

Python script for ingesting various files into a semantic graph. For text,...

24
Experimental
14 mcplusa/elastic-ingest-http

This is an Elasticsearch Ingest Pipeline Processor that calls an HTTP(s)...

23
Experimental
15 Leg0shii/smart-documents

A web application that enables users to upload documents and utilize AI...

23
Experimental
16 josego85/pdf-content-search

🔍 AI-powered PDF search with OCR support for scanned documents, local AI via...

23
Experimental
17 HarshilMaks/InsightDocs

AI Document Intelligence System for deep analysis and semantic querying of...

23
Experimental
18 moonlitrevery/DodocLens

Inteligência documental com IA local (OCR + busca semântica) para PDFs e...

23
Experimental
19 lookingforvirus/fastapi_auto_routes

⚡ Generate dynamic CRUD and Auth routes effortlessly with FastAPI Auto...

22
Experimental
20 gracee3/qdrant-bge-stack

Local deployment stack for Qdrant vector search with vLLM-served BAAI...

22
Experimental
21 arcnem-ai/arcnem-vision

An AI-powered platform for document ingestion, processing, and similarity search

22
Experimental
22 sapientpants/doctrans

Privacy-first document translation powered by local AI (Ollama). Upload PDF,...

22
Experimental
23 xhulianokoci/DocCompareAI

ASP.NET Core API for comparing Word documents with AI — text diff, OpenAI...

20
Experimental
24 danilagoleen/vetka-ingest-engine

Ingestion/indexing core for agent systems: scanning, extraction, dependency...

20
Experimental
25 VedantKothari01/DocInsight

AI-powered document originality and plagiarism risk detection system...

20
Experimental
26 mry0tt4/DocGenie

AI-powered documentation platform that automatically generates, categorizes,...

20
Experimental
27 Tonemon/StaxRead

Self-hosted semantic search over your own documents. Your own self-hosted...

19
Experimental
28 LeonKiptoo/document-intelligence-engine

A document intelligence system that enables semantic question answering over...

19
Experimental
29 ashankgupta/docai

DocAI is a Go-based toolkit that enables intelligent interaction with your...

19
Experimental
30 ivan-markov-666/rag-jag

Self-hosted AI-powered document search system. Upload PDF, DOCX, TXT, and...

19
Experimental
31 harshsrivastava05/Document-Analyzer

An AI-powered document analysis platform that transforms uploaded files into...

17
Experimental
32 HemalDholakiya12/PDFChat

A web app that allows users to upload PDFs and interact with them through a...

17
Experimental
33 ventz/pdf-semantic-keyword-analysis

High-performance PDF Semantic keyword analysis tool using AI for intelligent...

16
Experimental
34 akbar-ops/sistema-de-analisis-de-documentos-juridicos

đź“„ Analyze, classify, and search legal documents with advanced NLP techniques...

16
Experimental
35 cosmanBrenden/DocumentMuncher

DocumentMuncher is a locally running document seach engine that allows you...

15
Experimental
36 SSK-14/GitDoc-AI

GitDoc is your ultimate GitHub Documentation Explorer! It's your trusty...

15
Experimental
37 KaramelBytes/docloom-cli

AI‑augmented document analysis and lightweight retrieval (Go) with...

15
Experimental
38 KaavyaGala546/DocuMind-AI

DocuMind-AI is an AI-powered document assistant that allows users to upload...

15
Experimental
39 Irshad-11/PDF-INSIGHTS

Smart PDF Analyzer with OCR and Semantic Search

14
Experimental
40 mamoon-17/DocuQuery

DocuQuery — a minimal RAG demo: upload PDFs, generate local embeddings,...

14
Experimental
41 maharishiayurveda/DocQuify

Extract insights from research papers with DocQuify. Upload PDFs and ask...

14
Experimental
42 NathanMaine/rah-qdrant-integration

Community add-on for RA-H OS that replaces sqlite-vec with Qdrant for vector...

14
Experimental
43 Helixo613/docforensics

Cross-document contradiction and agreement detection for PDF collections...

14
Experimental
44 Naturestudyperinatologist466/fojin

Aggregate and search over 9,200 Buddhist texts in multiple languages with...

14
Experimental
45 JacobPolloreno/OfficeAnswers

Get to the real work by using neural information retrieval for company information.

12
Experimental
46 dyannadle/AI-Powered-Search-Over-Noation

An AI-powered document search engine that connects to Notion and Google...

11
Experimental
47 tstephx/book-ingestion-python

Book ingestion pipeline for processing PDF/EPUB into searchable chapters...

11
Experimental
48 devinitive-team/mirage

🏜️ Mirage: Universal, relevance search over PDF documents at any scale....

11
Experimental
49 KishoreMuruganantham/HackRx-6.0-Intelligent-Query-Retrieval

LLM-powered system for intelligent query–retrieval from large documents in...

11
Experimental
50 gururaser/qdrant-data-processor

A high-performance data ingestion pipeline in Go for processing Amazon...

11
Experimental
51 kstv364/intellidoc

Hackathon project - Intellidoc - ECM MVP with semantic search capabilities....

11
Experimental
52 bivex/qdrant_streamlit_generator_via_groq

🔍 QDRANT + STREAMLIT + GROQ = VECTOR SEARCH UI. Explore embeddings....

11
Experimental