Document Data Extraction NLP Tools
There are 9 document data extraction tools tracked. 1 score above 70 (verified tier). The highest-rated is google/langextract at 79/100 with 34,668 stars and 173,955 monthly downloads. 1 of the top 10 are actively maintained.
Get all 9 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=document-data-extraction&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
google/langextract
A Python library for extracting structured information from unstructured... |
|
Verified |
| 2 |
Extralit/extralit
Fast and accurate systemic data extraction with LLM assistance |
|
Emerging |
| 3 |
Keyvanhardani/german-ocr
German-OCR is specifically trained to extract text from German documents... |
|
Emerging |
| 4 |
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction |
|
Emerging |
| 5 |
parsee-ai/parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models.... |
|
Emerging |
| 6 |
xingbow/SciDaEx
Structured data extraction from research literature |
|
Emerging |
| 7 |
davendw49/sciparser
PDF parsing toolkit for preparing academic text corpus |
|
Experimental |
| 8 |
yaminivibha/LLM_InformationRetrieval
extracting "structured" information that is embedded in natural language... |
|
Experimental |
| 9 |
GiftMungmeeprued/document-parsers-list
A comprehensive list of document parsers, covering PDF-to-text conversion... |
|
Experimental |