kurianbenoy/whisper_normalizer

A python package for whisper normalizer

60
/ 100
Established

Implements OpenAI's Whisper normalization algorithm for standardizing ASR transcription output, reducing spurious WER/CER penalties from formatting differences. Provides specialized normalizers for English and Indic languages (Malayalam, Hindi, Tamil, etc.), addressing limitations where basic normalization degrades low-resource language text. Derives Indic logic from indic-nlp-library with extended Malayalam support for script-specific canonicalization.

76 stars and 414,981 monthly downloads. Used by 2 other packages. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 2 / 25
Adoption 21 / 25
Maturity 18 / 25
Community 19 / 25

How are scores calculated?

Stars

76

Forks

17

Language

Jupyter Notebook

License

MIT

Last pushed

Oct 06, 2025

Monthly downloads

414,981

Commits (30d)

0

Dependencies

3

Reverse dependents

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/kurianbenoy/whisper_normalizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.