Document Intelligence Extraction ML Frameworks
Tools for extracting, analyzing, and structuring data from documents (PDFs, images, administrative files) using OCR, deep learning, and NLP. Includes document management, parsing, and information retrieval. Does NOT include general document conversion, presentation generation, or book production/typesetting.
There are 99 document intelligence extraction frameworks tracked. 4 score above 50 (established tier). The highest-rated is paperless-ngx/paperless-ngx at 69/100 with 37,318 stars. 1 of the top 10 are actively maintained.
Get all 99 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=document-intelligence-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index... |
|
Established |
| 2 |
GoogleCloudPlatform/document-ai-samples
Sample applications and demos for Document AI, the end-to-end document... |
|
Established |
| 3 |
aphp/edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF... |
|
Established |
| 4 |
aws-solutions/document-understanding-solution
Example of integrating & using Amazon Textract, Amazon Comprehend, Amazon... |
|
Established |
| 5 |
naiveHobo/InvoiceNet
Deep neural network to extract intelligent information from invoice documents. |
|
Emerging |
| 6 |
jonaswinkler/paperless-ng
A supercharged version of paperless: scan, index and archive all your... |
|
Emerging |
| 7 |
ptmrio/autorename-pdf
autorename-pdf is a highly efficient tool designed to automatically rename... |
|
Emerging |
| 8 |
kiku-jw/DocStripper
🧹 DocStripper is a lightweight CLI utility that automatically cleans text documents |
|
Emerging |
| 9 |
jennis0/burdoc
Advanced PDF parsing for python |
|
Emerging |
| 10 |
AkshayG999/MistralOCR---AI-Powered-Document-Extraction
MistralOCR is an open-source application that transforms documents into... |
|
Emerging |
| 11 |
unknownman1244/ai-humanizer-api
🤖 Transform AI text into human-like writing with the AI Humanizer API,... |
|
Emerging |
| 12 |
BananaPuke/pdf-brain
📚 Index and enrich your PDFs and Markdown files locally for a powerful,... |
|
Emerging |
| 13 |
StabRise/ScaleDP
ScaleDP is an Open-Source extension of Apache Spark for Document Processing |
|
Emerging |
| 14 |
vladzima/neuronaming-dev
Open-source: AI powered business names generator. Proof of concept. |
|
Emerging |
| 15 |
adhorn/poliko
Demo web applications that use AWS Artificial Intelligence services ... |
|
Emerging |
| 16 |
study-assist/browser-extension
A tool to help you organise your bookmarks intelligently |
|
Emerging |
| 17 |
MSUSAzureAccelerators/Intelligent-Document-Processing-Accelerator
Showcase Azure platform’s machine learning capability to recognize document... |
|
Emerging |
| 18 |
Unstructured-IO/community
Open source libraries and APIs to build custom preprocessing pipelines for... |
|
Emerging |
| 19 |
HT0710/Receipt-Information-Extraction
Receipt-Information-Extraction |
|
Emerging |
| 20 |
aldawsarir/Vortex
AI-powered visual search and document understanding system that transforms... |
|
Experimental |
| 21 |
Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service
Cloud-based service for uploading, scraping, and managing PDF/DOCX... |
|
Experimental |
| 22 |
bloomsburyai/ctrlf-tutorial
AI powered Ctrl-F using alpha.thecape.ai API |
|
Experimental |
| 23 |
Ramtin-Karbaschi/enHumanizer_Bot
Transform AI-generated text to be indistinguishable from human writing.... |
|
Experimental |
| 24 |
amanyagami/Make_Presentation_Simple.io
📄➡️📊 Convert PDFs into AI-generated presentation decks using a fully... |
|
Experimental |
| 25 |
machinelearningZH/ogd_ai-metafairy
An app that helps you easily create high quality dataset descriptions – with... |
|
Experimental |
| 26 |
TheAkshatGupta/Intelligent-Document-Parsing-FinTech
NLP-based system to extract structured information from financial documents |
|
Experimental |
| 27 |
jwc524/clippy
A smart PDF reader that extracts text and generates headings and summaries... |
|
Experimental |
| 28 |
akshata29/digitalclaims
Microsoft Insurance Claims Automation, powered by AI, handles claim... |
|
Experimental |
| 29 |
FzS92/smart-pdf-highlighter
Automatically identify and highlight key content within PDF files using... |
|
Experimental |
| 30 |
shubhigupta991/PaperTxT
We plan to create an AI which has analytical reading and answering... |
|
Experimental |
| 31 |
gliff-ai/audit
gliff.ai AUDIT – a user-friendly browser interface for exploring a fully... |
|
Experimental |
| 32 |
jeson4535/ocrisp
📄 Implement your RAG workflow effortlessly with this all-in-one tool for... |
|
Experimental |
| 33 |
lucasjvds/Scanipy
Scanipy stands for "scan it with Python"—it's your smart Python library for... |
|
Experimental |
| 34 |
Danishmk1286/WCAG-Contrast-Checker-Ai
AI powered WCAG contrast checker that not only detects failures but fixes... |
|
Experimental |
| 35 |
shawnacontrary24/DocStripper
🧹 Clean up your documents with DocStripper, the AI-powered tool that removes... |
|
Experimental |
| 36 |
danielbusnz-lgtm/inkvault
AI-powered document processing pipeline with Claude, FastAPI, and AWS |
|
Experimental |
| 37 |
sjvrensburg/railreader2
Desktop PDF viewer optimised for high magnification viewing. |
|
Experimental |
| 38 |
6825972/a11y-tw-audit-skill
Audit Taiwan websites for accessibility issues using WCAG 2.2 AA and local... |
|
Experimental |
| 39 |
rooo1942/wireframe-ui
🛠️ Build wireframe components directly in your IDE and streamline mockup... |
|
Experimental |
| 40 |
amr122deqw/google-form-history
📝 Track your Google Form responses easily with this Chrome extension,... |
|
Experimental |
| 41 |
Avuii/DocuMind-AI
In Progress — Document Intelligence MVP for invoices & receipts... |
|
Experimental |
| 42 |
itssharmaXD/numbers-le
🔢 Extract numbers swiftly from JSON, YAML, CSV, TOML, INI, and ENV files at... |
|
Experimental |
| 43 |
Deathfrosthacker/Accessibility-Text-Enhancer
✨ Enhance web accessibility in real-time with this browser extension that... |
|
Experimental |
| 44 |
cstroie/DocMindAI
A comprehensive PHP-based AI toolkit for intelligent document processing and... |
|
Experimental |
| 45 |
reisel-g/doc2dataset
📄 Ingest documents into structured datasets for LLMs, ensuring numeric... |
|
Experimental |
| 46 |
sumitsahoo/erd-to-ddl
Generate DDLs from ER Diagrams using OpenAI Vision |
|
Experimental |
| 47 |
Mato989086/AI-INVOICE-OCR-ENGINE
🤖 Streamline invoice processing with this AI-powered OCR engine for accurate... |
|
Experimental |
| 48 |
shreastharaj/PasteClip
Manage and access your macOS clipboard history with PasteClip, a lightweight... |
|
Experimental |
| 49 |
Biellgrimm/itbaa
📄 Convert HTML to high-quality PDF with Itbaa, supporting vector output,... |
|
Experimental |
| 50 |
rogue-agent1/htmlstrip
htmlstrip - Strip HTML tags and extract text content |
|
Experimental |
| 51 |
MukundaKatta/ClipBoard
Clipboard history manager — smart snippets with search, tagging, content... |
|
Experimental |
| 52 |
Outofplace-tobacconist674/deeplens
Analyze EVM blockchain data on-chain to provide clear intelligence and... |
|
Experimental |
| 53 |
MukundaKatta/SketchFlow
Wireframe-to-code converter — generate HTML/CSS from structured component... |
|
Experimental |
| 54 |
ShaunakSen/AI-for-Web-Accessibility
This is the GitHub repository for my Masters dissertation titled: Artificial... |
|
Experimental |
| 55 |
gliff-ai/style
gliff.ai STYLE – a user-interface pattern gallery documenting themes and... |
|
Experimental |
| 56 |
Uli-Z/autoPDFtagger
autoPDFtagger is a Python tool designed for efficient home-office... |
|
Experimental |
| 57 |
ypratap11/invoice-processing-ai
AI-powered invoice processing system using Google Document AI - Automated AP... |
|
Experimental |
| 58 |
Bilal-03/invoice-extraction
AI-powered invoice data extraction using Computer Vision and NLP. Automates... |
|
Experimental |
| 59 |
Yashsonaar/LayoutLMv3-Fine-Tuning
Welcome to the LayoutLMv3 Fine-Tuning project! 🚀 This project focuses on... |
|
Experimental |
| 60 |
butlerlabs/docai
DocAI helps developers quickly build document, image and text processing... |
|
Experimental |
| 61 |
Aid-On/templex
Template Extractor - Extract abstract templates and document structures from... |
|
Experimental |
| 62 |
Bharathyalagi/OCR-Document-parser
Smart OCR application built with Tesseract and Streamlit that extracts... |
|
Experimental |
| 63 |
JuanCS-Dev/typecraft
AI-Powered Book Production Engine - Transform manuscripts into... |
|
Experimental |
| 64 |
kyritzb/AI-Low-Bandwidth-Video-Call
Video chat powered by artificial intelligence to make video chat over 30... |
|
Experimental |
| 65 |
dev-luckymhz/AIVisionText-invoice-OCR-typescript
AIVisionText is an advanced document analysis platform that harnesses the... |
|
Experimental |
| 66 |
Phu1237/extension-scan2ai
Discover a smarter way to interact with your screen! Scan2AI is a... |
|
Experimental |
| 67 |
hrushikesh009/TensorFlow-OCR-Invoice-Extractor
A TensorFlow OCR solution,Leveraging advanced object detection models like... |
|
Experimental |
| 68 |
ChanMeng666/emoji-story-generator
【Sprinkle some star dust on this repo!⭐️】An interactive web application that... |
|
Experimental |
| 69 |
halilxibrahim/ai-logo-generator-webapp
AI Logo generator Web App |
|
Experimental |
| 70 |
mlemineb/Document-Analyzer-App
A shiny application that analyzes financial documents (pdf format) using NLP... |
|
Experimental |
| 71 |
AnujKumar883/ScanForge
🔍 ScanForge simplifies document scanning and management, enhancing your... |
|
Experimental |
| 72 |
gcb/artificial-clippy
Homage to the O.G. digital assistant (which nobody wanted, but everybody got anyway.) |
|
Experimental |
| 73 |
stochastic-sisyphus/text-feature-span-extractor
Deterministic invoice extraction using native PDF text layers. No OCR... |
|
Experimental |
| 74 |
UnderTheTableHTV7/simplai_HTV7
A website application that uses NLP and Artificial Intelligence to recognize... |
|
Experimental |
| 75 |
sangpham06112004/ScanForge
🛠️ Simplify and automate code scanning to enhance security and streamline... |
|
Experimental |
| 76 |
NhanPhamThanh-IT/Scan-PDF-Paper
Advanced document analysis platform that extracts text from PDF, DOCX, and... |
|
Experimental |
| 77 |
angelpro17/Media-AI-Processor
Media-AI-Processor is a scalable media processing engine built with FastAPI... |
|
Experimental |
| 78 |
graceytl/ai-receipt-data-extraction
AI & ML research project for automatic product extraction, classification,... |
|
Experimental |
| 79 |
Muhib-Hasan/invoice-processor
📄 Process Vietnam e-invoices seamlessly with multi-format support and a... |
|
Experimental |
| 80 |
conditionedstimulus/DocumentClassifier
FastAPI application for document classification using a multimodal LayoutLM... |
|
Experimental |
| 81 |
Komorebirumu/awe-ms-20260326-1002-00
AI Personalized Children's Stories & Images |
|
Experimental |
| 82 |
Stravinskyopticalglass907/papertrail
Extract key insights from PDFs page by page with AI-powered summaries and... |
|
Experimental |
| 83 |
Zakwani123/rihal-docfusion
Receipt extraction and anomaly detection pipeline — OCR, Random Forest, Streamlit UI |
|
Experimental |
| 84 |
Garendra/qwen3-2b-ocr-app
🖼️ Extract text from PDF documents using Qwen3-2B-VL with a Docker setup and... |
|
Experimental |
| 85 |
anuja024/AI-ddr-report-generator
AI-powered system that generates automated Defect Detection Reports (DDR)... |
|
Experimental |
| 86 |
sarawagh27/smart-ai-file-organizer
AI-powered file organizer that automatically classifies and moves PDF, DOCX,... |
|
Experimental |
| 87 |
FajarSangTrader/text-feature-span-extractor
📄 Extract features from invoices using a robust text-layer span extractor.... |
|
Experimental |
| 88 |
ruban-ai/deep-learning-accessibility-audit
Deep learning system for automated accessibility analysis of digital content... |
|
Experimental |
| 89 |
man2k/AI-PDFReader
AI PDF Reader |
|
Experimental |
| 90 |
SartHak-0-Sach/Podcastr-AI_based_podcast_generation_application
The AI Podcast Platform is a state-of-the-art AI SaaS platform that empowers... |
|
Experimental |
| 91 |
onify/blueprint-aws-textract-pdf-to-form
Onify Blueprint: Amazon AWS Textract - PDF to form example |
|
Experimental |
| 92 |
ICan-js/ICan.js
Biblioteca para adição de mais acessibilidade em páginas da web através de... |
|
Experimental |
| 93 |
texta-tk/texta-ui
Front-End for the RESTful implementation of Texta Toolkit |
|
Experimental |
| 94 |
THILLAINATARAJAN-B/Maptizer
Geo-AI platform for real-time location intelligence, business viability, and... |
|
Experimental |
| 95 |
BluShooz/nauknauk-clone
AI-powered platform that transforms toy/figure photos into animated videos.... |
|
Experimental |
| 96 |
apple-fritter/digits
Powerful tool designed to clean and preprocess plaintext files; Remove... |
|
Experimental |
| 97 |
Traviseric/parallel-book-generation
Parallel AI Book Generation Architecture - Generate complete books in under... |
|
Experimental |
| 98 |
Avielzi/ScanMaster-AI
ScanMaster AI |
|
Experimental |
| 99 |
abishekmuthian/dsc-automation
Automate disability support committee in universities. |
|
Experimental |