mary-lev/llm-ocr

LLM-powered OCR evaluation and correction package that supports multiple language models for OCR processing and text correction tasks.

32
/ 100
Emerging

This tool helps researchers, archivists, and historians accurately convert scanned historical documents or images into digital text, even from challenging sources like old books. You provide images (like JPEGs) and their corresponding ALTO XML layout files, and the system outputs highly accurate, corrected text. It's designed for anyone working with physical documents that need precise digital conversion for analysis or archiving.

No commits in the last 6 months.

Use this if you need to extract and correct text from scanned documents, especially those with historical or complex layouts, and want to leverage advanced AI models for superior accuracy.

Not ideal if you only need basic OCR for modern, clean documents or if you prefer not to use external Large Language Model services.

historical-document-digitization archival-processing text-recognition digital-humanities document-analysis
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 3 / 25
Maturity 15 / 25
Community 12 / 25

How are scores calculated?

Stars

4

Forks

1

Language

Python

License

MIT

Last pushed

Jun 24, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mary-lev/llm-ocr"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.