Document Intelligence Extraction ML Frameworks

Tools for extracting, analyzing, and structuring data from documents (PDFs, images, administrative files) using OCR, deep learning, and NLP. Includes document management, parsing, and information retrieval. Does NOT include general document conversion, presentation generation, or book production/typesetting.

There are 99 document intelligence extraction frameworks tracked. 4 score above 50 (established tier). The highest-rated is paperless-ngx/paperless-ngx at 69/100 with 37,318 stars. 1 of the top 10 are actively maintained.

Get all 99 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=document-intelligence-extraction&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index...

69
Established
2 GoogleCloudPlatform/document-ai-samples

Sample applications and demos for Document AI, the end-to-end document...

63
Established
3 aphp/edspdf

EDS-PDF is a generic, pure-Python framework for text extraction from PDF...

50
Established
4 aws-solutions/document-understanding-solution

Example of integrating & using Amazon Textract, Amazon Comprehend, Amazon...

50
Established
5 naiveHobo/InvoiceNet

Deep neural network to extract intelligent information from invoice documents.

49
Emerging
6 jonaswinkler/paperless-ng

A supercharged version of paperless: scan, index and archive all your...

44
Emerging
7 ptmrio/autorename-pdf

autorename-pdf is a highly efficient tool designed to automatically rename...

44
Emerging
8 kiku-jw/DocStripper

🧹 DocStripper is a lightweight CLI utility that automatically cleans text documents

37
Emerging
9 jennis0/burdoc

Advanced PDF parsing for python

37
Emerging
10 AkshayG999/MistralOCR---AI-Powered-Document-Extraction

MistralOCR is an open-source application that transforms documents into...

37
Emerging
11 unknownman1244/ai-humanizer-api

🤖 Transform AI text into human-like writing with the AI Humanizer API,...

36
Emerging
12 BananaPuke/pdf-brain

📚 Index and enrich your PDFs and Markdown files locally for a powerful,...

36
Emerging
13 StabRise/ScaleDP

ScaleDP is an Open-Source extension of Apache Spark for Document Processing

35
Emerging
14 vladzima/neuronaming-dev

Open-source: AI powered business names generator. Proof of concept.

35
Emerging
15 adhorn/poliko

Demo web applications that use AWS Artificial Intelligence services ...

33
Emerging
16 study-assist/browser-extension

A tool to help you organise your bookmarks intelligently

33
Emerging
17 MSUSAzureAccelerators/Intelligent-Document-Processing-Accelerator

Showcase Azure platform’s machine learning capability to recognize document...

32
Emerging
18 Unstructured-IO/community

Open source libraries and APIs to build custom preprocessing pipelines for...

32
Emerging
19 HT0710/Receipt-Information-Extraction

Receipt-Information-Extraction

32
Emerging
20 aldawsarir/Vortex

AI-powered visual search and document understanding system that transforms...

28
Experimental
21 Yosef-AlSabbah/Cloud-Based-Document-Analytics-Service

Cloud-based service for uploading, scraping, and managing PDF/DOCX...

28
Experimental
22 bloomsburyai/ctrlf-tutorial

AI powered Ctrl-F using alpha.thecape.ai API

28
Experimental
23 Ramtin-Karbaschi/enHumanizer_Bot

Transform AI-generated text to be indistinguishable from human writing....

28
Experimental
24 amanyagami/Make_Presentation_Simple.io

📄➡️📊 Convert PDFs into AI-generated presentation decks using a fully...

27
Experimental
25 machinelearningZH/ogd_ai-metafairy

An app that helps you easily create high quality dataset descriptions – with...

27
Experimental
26 TheAkshatGupta/Intelligent-Document-Parsing-FinTech

NLP-based system to extract structured information from financial documents

27
Experimental
27 jwc524/clippy

A smart PDF reader that extracts text and generates headings and summaries...

26
Experimental
28 akshata29/digitalclaims

Microsoft Insurance Claims Automation, powered by AI, handles claim...

26
Experimental
29 FzS92/smart-pdf-highlighter

Automatically identify and highlight key content within PDF files using...

24
Experimental
30 shubhigupta991/PaperTxT

We plan to create an AI which has analytical reading and answering...

24
Experimental
31 gliff-ai/audit

gliff.ai AUDIT – a user-friendly browser interface for exploring a fully...

23
Experimental
32 jeson4535/ocrisp

📄 Implement your RAG workflow effortlessly with this all-in-one tool for...

23
Experimental
33 lucasjvds/Scanipy

Scanipy stands for "scan it with Python"—it's your smart Python library for...

23
Experimental
34 Danishmk1286/WCAG-Contrast-Checker-Ai

AI powered WCAG contrast checker that not only detects failures but fixes...

23
Experimental
35 shawnacontrary24/DocStripper

🧹 Clean up your documents with DocStripper, the AI-powered tool that removes...

23
Experimental
36 danielbusnz-lgtm/inkvault

AI-powered document processing pipeline with Claude, FastAPI, and AWS

23
Experimental
37 sjvrensburg/railreader2

Desktop PDF viewer optimised for high magnification viewing.

22
Experimental
38 6825972/a11y-tw-audit-skill

Audit Taiwan websites for accessibility issues using WCAG 2.2 AA and local...

22
Experimental
39 rooo1942/wireframe-ui

🛠️ Build wireframe components directly in your IDE and streamline mockup...

22
Experimental
40 amr122deqw/google-form-history

📝 Track your Google Form responses easily with this Chrome extension,...

22
Experimental
41 Avuii/DocuMind-AI

In Progress — Document Intelligence MVP for invoices & receipts...

22
Experimental
42 itssharmaXD/numbers-le

🔢 Extract numbers swiftly from JSON, YAML, CSV, TOML, INI, and ENV files at...

22
Experimental
43 Deathfrosthacker/Accessibility-Text-Enhancer

✨ Enhance web accessibility in real-time with this browser extension that...

22
Experimental
44 cstroie/DocMindAI

A comprehensive PHP-based AI toolkit for intelligent document processing and...

22
Experimental
45 reisel-g/doc2dataset

📄 Ingest documents into structured datasets for LLMs, ensuring numeric...

22
Experimental
46 sumitsahoo/erd-to-ddl

Generate DDLs from ER Diagrams using OpenAI Vision

22
Experimental
47 Mato989086/AI-INVOICE-OCR-ENGINE

🤖 Streamline invoice processing with this AI-powered OCR engine for accurate...

22
Experimental
48 shreastharaj/PasteClip

Manage and access your macOS clipboard history with PasteClip, a lightweight...

22
Experimental
49 Biellgrimm/itbaa

📄 Convert HTML to high-quality PDF with Itbaa, supporting vector output,...

22
Experimental
50 rogue-agent1/htmlstrip

htmlstrip - Strip HTML tags and extract text content

22
Experimental
51 MukundaKatta/ClipBoard

Clipboard history manager — smart snippets with search, tagging, content...

22
Experimental
52 Outofplace-tobacconist674/deeplens

Analyze EVM blockchain data on-chain to provide clear intelligence and...

22
Experimental
53 MukundaKatta/SketchFlow

Wireframe-to-code converter — generate HTML/CSS from structured component...

22
Experimental
54 ShaunakSen/AI-for-Web-Accessibility

This is the GitHub repository for my Masters dissertation titled: Artificial...

21
Experimental
55 gliff-ai/style

gliff.ai STYLE – a user-interface pattern gallery documenting themes and...

21
Experimental
56 Uli-Z/autoPDFtagger

autoPDFtagger is a Python tool designed for efficient home-office...

21
Experimental
57 ypratap11/invoice-processing-ai

AI-powered invoice processing system using Google Document AI - Automated AP...

21
Experimental
58 Bilal-03/invoice-extraction

AI-powered invoice data extraction using Computer Vision and NLP. Automates...

20
Experimental
59 Yashsonaar/LayoutLMv3-Fine-Tuning

Welcome to the LayoutLMv3 Fine-Tuning project! 🚀 This project focuses on...

20
Experimental
60 butlerlabs/docai

DocAI helps developers quickly build document, image and text processing...

19
Experimental
61 Aid-On/templex

Template Extractor - Extract abstract templates and document structures from...

19
Experimental
62 Bharathyalagi/OCR-Document-parser

Smart OCR application built with Tesseract and Streamlit that extracts...

18
Experimental
63 JuanCS-Dev/typecraft

AI-Powered Book Production Engine - Transform manuscripts into...

18
Experimental
64 kyritzb/AI-Low-Bandwidth-Video-Call

Video chat powered by artificial intelligence to make video chat over 30...

18
Experimental
65 dev-luckymhz/AIVisionText-invoice-OCR-typescript

AIVisionText is an advanced document analysis platform that harnesses the...

18
Experimental
66 Phu1237/extension-scan2ai

Discover a smarter way to interact with your screen! Scan2AI is a...

18
Experimental
67 hrushikesh009/TensorFlow-OCR-Invoice-Extractor

A TensorFlow OCR solution,Leveraging advanced object detection models like...

18
Experimental
68 ChanMeng666/emoji-story-generator

【Sprinkle some star dust on this repo!⭐️】An interactive web application that...

16
Experimental
69 halilxibrahim/ai-logo-generator-webapp

AI Logo generator Web App

16
Experimental
70 mlemineb/Document-Analyzer-App

A shiny application that analyzes financial documents (pdf format) using NLP...

16
Experimental
71 AnujKumar883/ScanForge

🔍 ScanForge simplifies document scanning and management, enhancing your...

15
Experimental
72 gcb/artificial-clippy

Homage to the O.G. digital assistant (which nobody wanted, but everybody got anyway.)

15
Experimental
73 stochastic-sisyphus/text-feature-span-extractor

Deterministic invoice extraction using native PDF text layers. No OCR...

15
Experimental
74 UnderTheTableHTV7/simplai_HTV7

A website application that uses NLP and Artificial Intelligence to recognize...

15
Experimental
75 sangpham06112004/ScanForge

🛠️ Simplify and automate code scanning to enhance security and streamline...

15
Experimental
76 NhanPhamThanh-IT/Scan-PDF-Paper

Advanced document analysis platform that extracts text from PDF, DOCX, and...

15
Experimental
77 angelpro17/Media-AI-Processor

Media-AI-Processor is a scalable media processing engine built with FastAPI...

14
Experimental
78 graceytl/ai-receipt-data-extraction

AI & ML research project for automatic product extraction, classification,...

14
Experimental
79 Muhib-Hasan/invoice-processor

📄 Process Vietnam e-invoices seamlessly with multi-format support and a...

14
Experimental
80 conditionedstimulus/DocumentClassifier

FastAPI application for document classification using a multimodal LayoutLM...

14
Experimental
81 Komorebirumu/awe-ms-20260326-1002-00

AI Personalized Children's Stories & Images

14
Experimental
82 Stravinskyopticalglass907/papertrail

Extract key insights from PDFs page by page with AI-powered summaries and...

14
Experimental
83 Zakwani123/rihal-docfusion

Receipt extraction and anomaly detection pipeline — OCR, Random Forest, Streamlit UI

14
Experimental
84 Garendra/qwen3-2b-ocr-app

🖼️ Extract text from PDF documents using Qwen3-2B-VL with a Docker setup and...

14
Experimental
85 anuja024/AI-ddr-report-generator

AI-powered system that generates automated Defect Detection Reports (DDR)...

14
Experimental
86 sarawagh27/smart-ai-file-organizer

AI-powered file organizer that automatically classifies and moves PDF, DOCX,...

14
Experimental
87 FajarSangTrader/text-feature-span-extractor

📄 Extract features from invoices using a robust text-layer span extractor....

14
Experimental
88 ruban-ai/deep-learning-accessibility-audit

Deep learning system for automated accessibility analysis of digital content...

14
Experimental
89 man2k/AI-PDFReader

AI PDF Reader

14
Experimental
90 SartHak-0-Sach/Podcastr-AI_based_podcast_generation_application

The AI Podcast Platform is a state-of-the-art AI SaaS platform that empowers...

13
Experimental
91 onify/blueprint-aws-textract-pdf-to-form

Onify Blueprint: Amazon AWS Textract - PDF to form example

13
Experimental
92 ICan-js/ICan.js

Biblioteca para adição de mais acessibilidade em páginas da web através de...

12
Experimental
93 texta-tk/texta-ui

Front-End for the RESTful implementation of Texta Toolkit

12
Experimental
94 THILLAINATARAJAN-B/Maptizer

Geo-AI platform for real-time location intelligence, business viability, and...

11
Experimental
95 BluShooz/nauknauk-clone

AI-powered platform that transforms toy/figure photos into animated videos....

11
Experimental
96 apple-fritter/digits

Powerful tool designed to clean and preprocess plaintext files; Remove...

11
Experimental
97 Traviseric/parallel-book-generation

Parallel AI Book Generation Architecture - Generate complete books in under...

11
Experimental
98 Avielzi/ScanMaster-AI

ScanMaster AI

11
Experimental
99 abishekmuthian/dsc-automation

Automate disability support committee in universities.

10
Experimental

Comparisons in this category