Document Data Extraction Generative AI Tools
Tools for extracting, parsing, and structuring data from documents (PDFs, images, business cards, invoices, tenders) using OCR and AI. Includes document intelligence, tabular data extraction, and field recognition. Does NOT include document summarization, general document Q&A without structured extraction, or legal/thematic document analysis.
There are 36 document data extraction tools tracked. The highest-rated is gmp007/PropertyExtractor at 31/100 with 13 stars.
Get all 36 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=generative-ai&subcategory=document-data-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
gmp007/PropertyExtractor
Generative AI-based Software for Material Property and Database Generation |
|
Emerging |
| 2 |
john-ng-hk/Biz-card-scanner
A digital repository for your physical business cards |
|
Experimental |
| 3 |
AdritPal08/universal-web-scraper-using-generative-ai
Effortless Data Extraction, Powered by : Generative AI |
|
Experimental |
| 4 |
ryanmcdonough/lexplore
Tool to allow extraction of data from legal documents |
|
Experimental |
| 5 |
jWinman91/AI-OCR
An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool |
|
Experimental |
| 6 |
jWinman91/AI-OCR-Frontend
An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool (frontend) |
|
Experimental |
| 7 |
thehackersplaybook/thp-ocr
THP-OCR: A simple Gen AI-powered OCR tool. 🍁 |
|
Experimental |
| 8 |
law4percent/CheckMe
CheckMe eliminates manual paper checking by using a flatbed scanner,... |
|
Experimental |
| 9 |
100ravSingh/ChequeScan
My Gen AI deployment |
|
Experimental |
| 10 |
kaifcoder/Invoice-Query-Tool-using-gemini-ai
This repository contains a Python project that leverages the Gemini Pro... |
|
Experimental |
| 11 |
viochris/Streamlit-SpendSense
💸 SpendSense: An AI-powered personal finance tracker built with Streamlit.... |
|
Experimental |
| 12 |
bejranonda/MeterVision
👁️ MeterVision: Enterprise-grade meter infrastructure management with a... |
|
Experimental |
| 13 |
Anshu-312/llm_structured_extractor
Extract structured ticket fields from text using OpenRouter LLM with strict... |
|
Experimental |
| 14 |
codedbyasim/Generative-AI-Document-Intelligence-System
Extract and summarise data from PDFs and images using OCR + LLMs. Built with... |
|
Experimental |
| 15 |
Wilson0406/Self-Improving-LLM-Agent
A dual-agent, feedback-driven document extraction system using GPT-5 and... |
|
Experimental |
| 16 |
jagratadeb/GenAI-UiPath-TextExtractor
UiPath automation using OCR and GenAI to extract key data from scanned... |
|
Experimental |
| 17 |
Akhand-Pratap-Tiwari/Cyber-Alertz-web-scrapping-microservice
Flask app for scraping cybersecurity website and purify the raw content... |
|
Experimental |
| 18 |
codeterrayt/Scalable-Genai-Invoice-PDF-Data-Extractor
Scalable GenAI-powered system to extract structured invoice data from PDFs &... |
|
Experimental |
| 19 |
0ameyasr/DocVal-Mini
Insurance Document Validation with Gemini AI + FastAPI |
|
Experimental |
| 20 |
francesco-s/document-claim-mapping
A tool using LLMs and few-shot learning for document-claim mapping and... |
|
Experimental |
| 21 |
Anthtrax/AIcheck
📸 Streamline your study process with AIcheck, a quick job-checking tool that... |
|
Experimental |
| 22 |
Naresh1401/Intelligent-document-processing
LLM-powered document processing: extract structured data from invoices,... |
|
Experimental |
| 23 |
amikrsin/StatementSync-Lite
StatementSync is a lightweight, high-performance Progressive Web App (PWA)... |
|
Experimental |
| 24 |
artyuan/smart-receipt-assistant
Reads market invoices to extract and analyze spending data. Tracks prices of... |
|
Experimental |
| 25 |
raihan-karim-ishmam/NLP-Pipeline-for-Document-Intelligence-Public
This project is a high-performance, fully offline AI pipeline for... |
|
Experimental |
| 26 |
MasterChief-ai/AI-Dataset-Analysis-Tool
An AI-powered dataset analysis tool that automatically classifies tasks... |
|
Experimental |
| 27 |
RajhansJain/MULTI-LANGUAGE-INVOICE-EXTRACTOR-LLM
AI-powered invoice understanding system using Vision + LLMs (Gemini API).... |
|
Experimental |
| 28 |
Phoenixcoder-6/po-automation
This project automates the extraction, parsing, and structuring of purchase... |
|
Experimental |
| 29 |
Chaitanyakrishna294/Myntra_Genai
myntra reveiw analysis using genai |
|
Experimental |
| 30 |
codewithdark-git/TrustChecker
An AI-powered website content verification system that analyzes web pages... |
|
Experimental |
| 31 |
Debjyoti2004/PhotoCheck-AI
An intelligent web application that instantly verifies if a passport photo... |
|
Experimental |
| 32 |
Suriya-Prakashar/AI-driven-tender-scrutiny-system-for-NLCI
AI-powered system for NLC India Limited to automate tender scrutiny. Uses... |
|
Experimental |
| 33 |
het953/AI-Web-Scraper
An intelligent web scraping tool built with Streamlit, Selenium, and... |
|
Experimental |
| 34 |
kmaurinjones/Housing-Law-Insight
Web application designed to showcase the potential of Data Science and... |
|
Experimental |
| 35 |
dhcgn/anthropic-paperless-ngx-ocr
AnthropicPaperOCR is a CLI tool that extracts text from PDFs using advanced... |
|
Experimental |
| 36 |
dvp-git/gemini-information-extractor
A simple single interface information extractor app using the latest... |
|
Experimental |