tesseract.js and tesseract

The JavaScript binding (A) wraps the C++ engine (B) to enable OCR functionality in web and Node.js environments, making them complementary tools where A depends on B as its underlying engine.

tesseract.js
78
Verified
tesseract
61
Established
Maintenance 10/25
Adoption 25/25
Maturity 25/25
Community 18/25
Maintenance 13/25
Adoption 10/25
Maturity 16/25
Community 22/25
Stars: 37,920
Forks: 2,363
Downloads: 3,951,624
Commits (30d): 0
Language: JavaScript
License: Apache-2.0
Stars: 72,883
Forks: 10,541
Downloads:
Commits (30d): 1
Language: C++
License: Apache-2.0
No risk flags
No Package No Dependents

About tesseract.js

naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Based on the README, here's a technical summary: Wraps the Tesseract OCR engine as WebAssembly to enable client-side and server-side text extraction, with worker-based concurrency support for parallel image processing. Uses language model downloads on first run (now 50-73% smaller than v5) and supports multiple output formats including hOCR and granular block-level data. Runs in browsers via CDN/webpack/ESM and Node.js v16+, with no PDF support or model optimization in scope.

About tesseract

tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

Combines LSTM-based neural net recognition with legacy character pattern matching engines, supporting 100+ languages and multiple output formats (hOCR, PDF, TSV, ALTO). Exposes `libtesseract` C/C++ APIs for embedding in applications, with traineddata files enabling language-specific model swapping. Built on Leptonica for image I/O and supports per-page segmentation modes and OCR engine selection via command-line flags.

Scores updated daily from GitHub, PyPI, and npm data. How scores work