mary-lev/llm-ocr

LLM-powered OCR evaluation and correction package that supports multiple language models for OCR processing and text correction tasks.

/ 100

Emerging

This tool helps researchers, archivists, and historians accurately convert scanned historical documents or images into digital text, even from challenging sources like old books. You provide images (like JPEGs) and their corresponding ALTO XML layout files, and the system outputs highly accurate, corrected text. It's designed for anyone working with physical documents that need precise digital conversion for analysis or archiving.

No commits in the last 6 months.

Use this if you need to extract and correct text from scanned documents, especially those with historical or complex layouts, and want to leverage advanced AI models for superior accuracy.

Not ideal if you only need basic OCR for modern, clean documents or if you prefer not to use external Large Language Model services.

historical-document-digitization archival-processing text-recognition digital-humanities document-analysis

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 3 / 25

Maturity 15 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

FuxiaoLiu/LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

kiyoshisasano/llm-failure-atlas

A graph-based failure modeling and deterministic detection system for LLM agent runtimes.

gwasiakshay/llm-eval-benchmark

LLM evaluation & benchmarking framework using LLM-as-a-judge scoring, multi-model comparison,...

useentropy/llmkit

LLM Kit - Python Large Language Model Kit for generating data of your choice

flamehaven01/CRoM-EfficientLLM

A Python toolkit to optimize LLM context by intelligently selecting, re-ranking, and...

Explore Transformer Models

All categories Trending Transformer directory Insights