Danitilahun/Document-processing-Pdf-Structured-Data-Extractor
This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.
No commits in the last 6 months.
Stars
3
Forks
1
Language
Jupyter Notebook
License
—
Category
Last pushed
Apr 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Danitilahun/Document-processing-Pdf-Structured-Data-Extractor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/langextract
A Python library for extracting structured information from unstructured text using LLMs with...
Extralit/extralit
Fast and accurate systemic data extraction with LLM assistance
Keyvanhardani/german-ocr
German-OCR is specifically trained to extract text from German documents including invoices,...
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
xingbow/SciDaEx
Structured data extraction from research literature