naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

/ 100

Verified

Based on the README, here's a technical summary: Wraps the Tesseract OCR engine as WebAssembly to enable client-side and server-side text extraction, with worker-based concurrency support for parallel image processing. Uses language model downloads on first run (now 50-73% smaller than v5) and supports multiple output formats including hOCR and granular block-level data. Runs in browsers via CDN/webpack/ESM and Node.js v16+, with no PDF support or model optimization in scope.

37,920 stars and 3,951,624 monthly downloads. Used by 5 other packages. Available on npm.

Maintenance 10 / 25

Adoption 25 / 25

Maturity 25 / 25

Community 18 / 25

How are scores calculated?

Stars

37,920

Forks

2,363

Language

JavaScript

License

Apache-2.0

Category

latex-ocr-tools

Last pushed

Feb 28, 2026

Monthly downloads

3,951,624

Commits (30d)

Dependencies

Reverse dependents

GitHub npm

LaTeX OCR Tools · 54 frameworks

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/naptha/tesseract.js"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Compare

tesseract.js and tesseract

Related frameworks

open-mmlab/mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox

mayocream/koharu

ML-powered manga translator, written in Rust.

lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

mindspore-lab/mindocr

A toolbox of ocr models and algorithms based on MindSpore

zyddnys/manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

Explore ML Frameworks

All categories Trending ML Framework directory Insights