tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
Combines LSTM-based neural net recognition with legacy character pattern matching engines, supporting 100+ languages and multiple output formats (hOCR, PDF, TSV, ALTO). Exposes `libtesseract` C/C++ APIs for embedding in applications, with traineddata files enabling language-specific model swapping. Built on Leptonica for image I/O and supports per-page segmentation modes and OCR engine selection via command-line flags.
72,883 stars. Actively maintained with 1 commit in the last 30 days.
Stars
72,883
Forks
10,541
Language
C++
License
Apache-2.0
Category
Last pushed
Feb 28, 2026
Commits (30d)
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/tesseract-ocr/tesseract"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related frameworks
naptha/tesseract.js
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
open-mmlab/mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
mayocream/koharu
ML-powered manga translator, written in Rust.
mindspore-lab/mindocr
A toolbox of ocr models and algorithms based on MindSpore
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.