Rushi-Balapure/pdf_2_json_extractor
A high-performance Python library for extracting structured content from PDF documents with layout-aware text extraction. pdf_to_json preserves document structure including headings (H1-H6) and body text, outputting clean JSON format.
Stars
3
Forks
1
Language
Python
License
—
Category
Last pushed
Jan 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Rushi-Balapure/pdf_2_json_extractor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepdoctection/deepdoctection
A Repo For Document AI
deanmalmgren/textract
extract text from any document. no muss. no fuss.
eikek/docspell
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources...
zzzDavid/ICDAR-2019-SROIE
ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic...