NanoNets/docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

/ 100

Established

1,379 stars and 2,912 monthly downloads. Available on PyPI.

Maintenance 6 / 25

Adoption 18 / 25

Maturity 24 / 25

Community 20 / 25

How are scores calculated?

Stars

1,379

Forks

125

Language

Python

License

MIT

Category

document-data-extraction

Last pushed

Oct 31, 2025

Monthly downloads

2,912

Commits (30d)

Dependencies

GitHub PyPI

Document Data Extraction · 74 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/NanoNets/docstrange"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

hashangit/Extract2MD

Extract2MD is a powerful and versatile AI-enabled client-side JavaScript library for extracting...

th1nhhdk/local_ai_ocr

An local, offline (after initial setup), portable OCR software that can process images and PDF...

Dicklesworthstone/llm_aided_ocr

Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...

emcf/thepipe

Get clean data from tricky documents, powered by vision-language models ⚡

langstruct-ai/langstruct

Extract structured data from any content using LLMs.

Explore LLM Tools

All categories Trending LLM Tool directory Insights