Document Data Extraction Generative AI Tools

Tools for extracting, parsing, and structuring data from documents (PDFs, images, business cards, invoices, tenders) using OCR and AI. Includes document intelligence, tabular data extraction, and field recognition. Does NOT include document summarization, general document Q&A without structured extraction, or legal/thematic document analysis.

There are 36 document data extraction tools tracked. The highest-rated is gmp007/PropertyExtractor at 31/100 with 13 stars.

Get all 36 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=generative-ai&subcategory=document-data-extraction&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 gmp007/PropertyExtractor

Generative AI-based Software for Material Property and Database Generation

31
Emerging
2 john-ng-hk/Biz-card-scanner

A digital repository for your physical business cards

28
Experimental
3 AdritPal08/universal-web-scraper-using-generative-ai

Effortless Data Extraction, Powered by : Generative AI

27
Experimental
4 ryanmcdonough/lexplore

Tool to allow extraction of data from legal documents

25
Experimental
5 jWinman91/AI-OCR

An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool

25
Experimental
6 jWinman91/AI-OCR-Frontend

An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool (frontend)

24
Experimental
7 thehackersplaybook/thp-ocr

THP-OCR: A simple Gen AI-powered OCR tool. 🍁

23
Experimental
8 law4percent/CheckMe

CheckMe eliminates manual paper checking by using a flatbed scanner,...

22
Experimental
9 100ravSingh/ChequeScan

My Gen AI deployment

22
Experimental
10 kaifcoder/Invoice-Query-Tool-using-gemini-ai

This repository contains a Python project that leverages the Gemini Pro...

21
Experimental
11 viochris/Streamlit-SpendSense

💸 SpendSense: An AI-powered personal finance tracker built with Streamlit....

19
Experimental
12 bejranonda/MeterVision

👁️ MeterVision: Enterprise-grade meter infrastructure management with a...

19
Experimental
13 Anshu-312/llm_structured_extractor

Extract structured ticket fields from text using OpenRouter LLM with strict...

19
Experimental
14 codedbyasim/Generative-AI-Document-Intelligence-System

Extract and summarise data from PDFs and images using OCR + LLMs. Built with...

16
Experimental
15 Wilson0406/Self-Improving-LLM-Agent

A dual-agent, feedback-driven document extraction system using GPT-5 and...

15
Experimental
16 jagratadeb/GenAI-UiPath-TextExtractor

UiPath automation using OCR and GenAI to extract key data from scanned...

15
Experimental
17 Akhand-Pratap-Tiwari/Cyber-Alertz-web-scrapping-microservice

Flask app for scraping cybersecurity website and purify the raw content...

15
Experimental
18 codeterrayt/Scalable-Genai-Invoice-PDF-Data-Extractor

Scalable GenAI-powered system to extract structured invoice data from PDFs &...

15
Experimental
19 0ameyasr/DocVal-Mini

Insurance Document Validation with Gemini AI + FastAPI

15
Experimental
20 francesco-s/document-claim-mapping

A tool using LLMs and few-shot learning for document-claim mapping and...

14
Experimental
21 Anthtrax/AIcheck

📸 Streamline your study process with AIcheck, a quick job-checking tool that...

14
Experimental
22 Naresh1401/Intelligent-document-processing

LLM-powered document processing: extract structured data from invoices,...

14
Experimental
23 amikrsin/StatementSync-Lite

StatementSync is a lightweight, high-performance Progressive Web App (PWA)...

14
Experimental
24 artyuan/smart-receipt-assistant

Reads market invoices to extract and analyze spending data. Tracks prices of...

13
Experimental
25 raihan-karim-ishmam/NLP-Pipeline-for-Document-Intelligence-Public

This project is a high-performance, fully offline AI pipeline for...

12
Experimental
26 MasterChief-ai/AI-Dataset-Analysis-Tool

An AI-powered dataset analysis tool that automatically classifies tasks...

12
Experimental
27 RajhansJain/MULTI-LANGUAGE-INVOICE-EXTRACTOR-LLM

AI-powered invoice understanding system using Vision + LLMs (Gemini API)....

11
Experimental
28 Phoenixcoder-6/po-automation

This project automates the extraction, parsing, and structuring of purchase...

11
Experimental
29 Chaitanyakrishna294/Myntra_Genai

myntra reveiw analysis using genai

11
Experimental
30 codewithdark-git/TrustChecker

An AI-powered website content verification system that analyzes web pages...

11
Experimental
31 Debjyoti2004/PhotoCheck-AI

An intelligent web application that instantly verifies if a passport photo...

11
Experimental
32 Suriya-Prakashar/AI-driven-tender-scrutiny-system-for-NLCI

AI-powered system for NLC India Limited to automate tender scrutiny. Uses...

11
Experimental
33 het953/AI-Web-Scraper

An intelligent web scraping tool built with Streamlit, Selenium, and...

11
Experimental
34 kmaurinjones/Housing-Law-Insight

Web application designed to showcase the potential of Data Science and...

11
Experimental
35 dhcgn/anthropic-paperless-ngx-ocr

AnthropicPaperOCR is a CLI tool that extracts text from PDFs using advanced...

10
Experimental
36 dvp-git/gemini-information-extractor

A simple single interface information extractor app using the latest...

10
Experimental