tesseract.js and tesseract

The JavaScript binding (A) wraps the C++ engine (B) to enable OCR functionality in web and Node.js environments, making them complementary tools where A depends on B as its underlying engine.

tesseract.js

Verified

tesseract

Established

Maintenance 10/25

Adoption 25/25

Maturity 25/25

Community 18/25

Maintenance 13/25

Adoption 10/25

Maturity 16/25

Community 22/25

Stars: 37,920

Forks: 2,363

Downloads: 3,951,624

Commits (30d): 0

Language: JavaScript

License: Apache-2.0

Stars: 72,883

Forks: 10,541

Downloads: —

Commits (30d): 1

Language: C++

License: Apache-2.0

No risk flags

No Package No Dependents

About tesseract.js

naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Based on the README, here's a technical summary: Wraps the Tesseract OCR engine as WebAssembly to enable client-side and server-side text extraction, with worker-based concurrency support for parallel image processing. Uses language model downloads on first run (now 50-73% smaller than v5) and supports multiple output formats including hOCR and granular block-level data. Runs in browsers via CDN/webpack/ESM and Node.js v16+, with no PDF support or model optimization in scope.

About tesseract

tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

Combines LSTM-based neural net recognition with legacy character pattern matching engines, supporting 100+ languages and multiple output formats (hOCR, PDF, TSV, ALTO). Exposes `libtesseract` C/C++ APIs for embedding in applications, with traineddata files enabling language-specific model swapping. Built on Leptonica for image I/O and supports per-page segmentation modes and OCR engine selection via command-line flags.

Scores updated daily from GitHub, PyPI, and npm data. How scores work