naptha/tesseract.js
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
Based on the README, here's a technical summary: Wraps the Tesseract OCR engine as WebAssembly to enable client-side and server-side text extraction, with worker-based concurrency support for parallel image processing. Uses language model downloads on first run (now 50-73% smaller than v5) and supports multiple output formats including hOCR and granular block-level data. Runs in browsers via CDN/webpack/ESM and Node.js v16+, with no PDF support or model optimization in scope.
37,920 stars and 3,951,624 monthly downloads. Used by 5 other packages. Available on npm.
Stars
37,920
Forks
2,363
Language
JavaScript
License
Apache-2.0
Category
Last pushed
Feb 28, 2026
Monthly downloads
3,951,624
Commits (30d)
0
Dependencies
9
Reverse dependents
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/naptha/tesseract.js"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related frameworks
open-mmlab/mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
mayocream/koharu
ML-powered manga translator, written in Rust.
lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
mindspore-lab/mindocr
A toolbox of ocr models and algorithms based on MindSpore
zyddnys/manga-image-translator
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)