Rushi-Balapure/pdf_2_json_extractor

A high-performance Python library for extracting structured content from PDF documents with layout-aware text extraction. pdf_to_json preserves document structure including headings (H1-H6) and body text, outputting clean JSON format.

/ 100

Emerging

No Package No Dependents

Maintenance 10 / 25

Adoption 3 / 25

Maturity 9 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Category

document-ocr-extraction

Last pushed

Jan 06, 2026

Commits (30d)

GitHub

Document OCR Extraction · 57 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Rushi-Balapure/pdf_2_json_extractor"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

deepdoctection/deepdoctection

A Repo For Document AI

deanmalmgren/textract

extract text from any document. no muss. no fuss.

eikek/docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources...

zzzDavid/ICDAR-2019-SROIE

ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic...

Explore NLP Tools

All categories Trending NLP directory Insights