axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

/ 100

Emerging

Employs a modular processing pipeline with OCR (Tesseract), table extraction (Camelot), and semantic hierarchy reconstruction to detect headings, lists, and document structure. Outputs cleaned data as JSON, Markdown, CSV, or plaintext via a REST API, with optional Docker deployment and Python client for programmatic access.

6,170 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

6,170

Forks

324

Language

JavaScript

License

Apache-2.0

Higher-rated alternatives

deepdoctection/deepdoctection

A Repo For Document AI

deanmalmgren/textract

extract text from any document. no muss. no fuss.

eikek/docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources...

zzzDavid/ICDAR-2019-SROIE

ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic...

Explore NLP Tools

All categories Trending NLP directory Insights