Document OCR Extraction NLP Tools
Tools for extracting structured and unstructured text from documents (PDFs, scans, receipts, invoices, IDs) using OCR and computer vision. Does NOT include general document analysis, summarization, or retrieval systems without extraction focus.
There are 57 document ocr extraction tools tracked. 2 score above 70 (verified tier). The highest-rated is deepdoctection/deepdoctection at 85/100 with 3,147 stars and 5,833 monthly downloads. 2 of the top 10 are actively maintained.
Get all 57 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=document-ocr-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
deepdoctection/deepdoctection
A Repo For Document AI |
|
Verified |
| 2 |
deanmalmgren/textract
extract text from any document. no muss. no fuss. |
|
Verified |
| 3 |
eikek/docspell
Assist in organizing your piles of documents, resulting from scanners,... |
|
Established |
| 4 |
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer... |
|
Emerging |
| 5 |
axa-group/Parsr
Transforms PDF, Documents and Images into Enriched Structured Data |
|
Emerging |
| 6 |
zzzDavid/ICDAR-2019-SROIE
ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information... |
|
Emerging |
| 7 |
Saransh-cpp/OCRed
Clever, simple, and intuitive wrapper functionalities for OCRing specific... |
|
Emerging |
| 8 |
rithulkamesh/docproc
Document Intelligence Platform — Extract, refine, and query documents with... |
|
Emerging |
| 9 |
gnana70/tamil_ocr
OCR Tamil is a powerful tool that can detect and recognize text in Tamil... |
|
Emerging |
| 10 |
JonnoB/reading_the_unreadable
A pipeline for performing OCR on historical newspapers |
|
Emerging |
| 11 |
Rushi-Balapure/pdf_2_json_extractor
A high-performance Python library for extracting structured content from PDF... |
|
Emerging |
| 12 |
NjoyimPeguy/ICDAR-2019-RRC-SROIE
ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information... |
|
Emerging |
| 13 |
s3nh/text-detector
Tool which allow you to detect and translate text. |
|
Emerging |
| 14 |
gani114433/OCR_workflow
N8N OCR workflow |
|
Emerging |
| 15 |
Shulk97/daniel
This repository contain the implementation of DANIEL. (A fast Document... |
|
Experimental |
| 16 |
clovaai/webvicob
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023 |
|
Experimental |
| 17 |
louisbrulenaudet/apple-ocr
Easy-to-Use Apple Vision wrapper for text extraction, scalar representation... |
|
Experimental |
| 18 |
situx/CuneiPainter
An App to recognize cuneiform characters on your Android phone |
|
Experimental |
| 19 |
lukevanin/OCRAI
Optical Character Recognition Artificial Intelligence iOS app for Udacity nanodegree |
|
Experimental |
| 20 |
trhgquan/OCR_chu_nom
Đồ án OCR chữ Nôm (CSC15006) |
|
Experimental |
| 21 |
Samuel310/Text-Recognition
Android application to extract text from an image using firebase MLkit. |
|
Experimental |
| 22 |
codebywiam/invoice-ocr
This project extracts key fields (like invoice number, date, total, and... |
|
Experimental |
| 23 |
ierolsen/Business-Card-Reader-App
The main idea of this project is that extracting entities from the scanned... |
|
Experimental |
| 24 |
jweissenberger/auto-docs
A CLI tool that automatically generates documentation for python code using... |
|
Experimental |
| 25 |
Zer0-Bug/ID-Document_Recognition
End-to-end offline OCR and semantic parsing pipeline for identity documents... |
|
Experimental |
| 26 |
macosnik/Recognize-text-from-image
Telegram-бот для распознавания текста на изображениях с использованием нейросетей |
|
Experimental |
| 27 |
DecisionNerd/docunderstand
A python system for Visually Rich Document Understanding |
|
Experimental |
| 28 |
SundayOni/document-ocr-nlp-pipeline
End-to-end pipeline for extracting and structuring text from scanned, PDF... |
|
Experimental |
| 29 |
nicdriebe/ocr-ner-sharepic-evaluation
Bachelor's Thesis: Evaluation of open-source OCR and NER pipelines... |
|
Experimental |
| 30 |
michael-borck/document-lens
Analyzes text documents for readability, academic integrity, and linguistic... |
|
Experimental |
| 31 |
avrtt/MobileEAST
Paper and code for a lightweight & fast scene text detection based on EAST... |
|
Experimental |
| 32 |
itshivams/Persona-Driven-Document-Intelligence
Persona-Driven Document Intelligence – A lightweight, CPU-only system that... |
|
Experimental |
| 33 |
isikmuhamm/unstructured-data-extraction-engine
Automated data ingestion pipeline for extracting plain text from proprietary... |
|
Experimental |
| 34 |
transybao1393/android-ocr
Android OCR using CameraX, support MLKit, support offline mode, support... |
|
Experimental |
| 35 |
fmadore/iwac-ai-pipelines
AI pipelines for Omeka S digital collections - OCR correction, entity... |
|
Experimental |
| 36 |
erl-ang/interactive-ocr
Implementation of a couple of heuristics that estimate OCR quality without... |
|
Experimental |
| 37 |
meck93/ScanOrUploadMe
A React-Native mobile application that digitalizes physical event... |
|
Experimental |
| 38 |
xuan3986/Texthandle
Open source project provided to Baidu PaddlePaddle community. Apply... |
|
Experimental |
| 39 |
marekpridal/Vision-OCR-Demo
Sample project for on-device text recognition |
|
Experimental |
| 40 |
iytedbb/OSPA-SuryaOCR
OSPA SuryaOCR – Advanced document processing framework for historical... |
|
Experimental |
| 41 |
SivaPA08/text-capture
Captures screen regions, extracts text and copies it to the clipboard |
|
Experimental |
| 42 |
dev-sungman/recent-ocr-papers
this repo include paper review, code in text detection, text recognition,... |
|
Experimental |
| 43 |
asainov1/invoice-generator-agent
Telegram bot for invoice generation — OCR (Tesseract) → NLP parsing → PDF... |
|
Experimental |
| 44 |
Komorebirumu/awe-ms-20260315-2211-01
AI Historical Document Transcription & Analysis CLI Tool |
|
Experimental |
| 45 |
archity/doc-scanner
Computer Vision and NLP based document scanner, text extractor and summarizer. |
|
Experimental |
| 46 |
esteininger/file-processor
A Python library that uses AI to convert unstructured files (like PDFs,... |
|
Experimental |
| 47 |
HySonLab/TeBaAb
TeBaAb: Text-Based Antigen-Conditioned Antibody Redesign via Directed Evolution |
|
Experimental |
| 48 |
Keizouw8/OCR-Command-Line-Tool
A tool that can be used in the CLI or NodeJS environment to scan for text in... |
|
Experimental |
| 49 |
shubh11220/PDF-Text-Extraction
Create a data extraction platform for users to conveniently obtain data in a... |
|
Experimental |
| 50 |
avirajsa/DocuMind
DocuMind - Python project for document analysis. Analyze, summarize, and... |
|
Experimental |
| 51 |
Cool-fire/Snipps
📚 📝📜 A simple android app to convert information into digital snippets,... |
|
Experimental |
| 52 |
mishaelaaa/OCR
This is a project in which I store all my attempts to create an application... |
|
Experimental |
| 53 |
saloni-rangari/nlp-ocr-marathi
This mini-project implements Marathi handwritten text recognition using... |
|
Experimental |
| 54 |
husnutass/ml_kit_app
A Flutter mobile app to read data from business cards and save that data in... |
|
Experimental |
| 55 |
emilyhasson/Text-Recognition
Scripts to convert low-quality scanned PDFs to text files using Google Cloud... |
|
Experimental |
| 56 |
fdovila/PDF2TXT4NLP
an online Python web app that accepts academic articles in PDF format and... |
|
Experimental |
| 57 |
Prateek32177/TextlyAI
AI-powered tool to extract and classify text from images using OCR and... |
|
Experimental |