gladiaio/normalization

A lightweight library for normalizing speech transcripts before computing WER

/ 100

Emerging

Implements a **three-stage deterministic pipeline** (text pre-processing → word processing → text post-processing) where steps are declaratively composed in YAML presets, with built-in language packs for English and French. Handles domain-specific transformations like number-to-word conversion, contraction expansion, and time/currency formatting through registered step classes that protect and restore placeholders across stages. Designed for ASR benchmarking workflows where WER computation requires canonical forms of semantically equivalent transcriptions.

No Package No Dependents

Maintenance 13 / 25

Adoption 5 / 25

Maturity 9 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

speechio/chinese_text_normalization

Chinese text normalization for speech processing

NickZaitsev/ru-normalizr

ru-normalizr — лучший open-source нормализатор русского текста. Приводит числа, даты, время,...

repodiac/german_transliterate

Python module to clean and transliterate (i.e. normalize) German text including abbreviations,...

google-research-datasets/TextNormalizationCoveringGrammars

Covering grammars for English and Russian text normalization

34j/mecab-text-cleaner

Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights