naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

78
/ 100
Verified

Based on the README, here's a technical summary: Wraps the Tesseract OCR engine as WebAssembly to enable client-side and server-side text extraction, with worker-based concurrency support for parallel image processing. Uses language model downloads on first run (now 50-73% smaller than v5) and supports multiple output formats including hOCR and granular block-level data. Runs in browsers via CDN/webpack/ESM and Node.js v16+, with no PDF support or model optimization in scope.

37,920 stars and 3,951,624 monthly downloads. Used by 5 other packages. Available on npm.

Maintenance 10 / 25
Adoption 25 / 25
Maturity 25 / 25
Community 18 / 25

How are scores calculated?

Stars

37,920

Forks

2,363

Language

JavaScript

License

Apache-2.0

Category

latex-ocr-tools

Last pushed

Feb 28, 2026

Monthly downloads

3,951,624

Commits (30d)

0

Dependencies

9

Reverse dependents

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/naptha/tesseract.js"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.