tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

/ 100

Established

Combines LSTM-based neural net recognition with legacy character pattern matching engines, supporting 100+ languages and multiple output formats (hOCR, PDF, TSV, ALTO). Exposes `libtesseract` C/C++ APIs for embedding in applications, with traineddata files enabling language-specific model swapping. Built on Leptonica for image I/O and supports per-page segmentation modes and OCR engine selection via command-line flags.

72,883 stars. Actively maintained with 1 commit in the last 30 days.

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

72,883

Forks

10,541

Language

C++

License

Apache-2.0

Compare

tesseract and tesseract.js

Related frameworks

naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

open-mmlab/mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox

mayocream/koharu

ML-powered manga translator, written in Rust.

mindspore-lab/mindocr

A toolbox of ocr models and algorithms based on MindSpore

lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Explore ML Frameworks

All categories Trending ML Framework directory Insights