gladiaio/normalization
A lightweight library for normalizing speech transcripts before computing WER
Implements a **three-stage deterministic pipeline** (text pre-processing → word processing → text post-processing) where steps are declaratively composed in YAML presets, with built-in language packs for English and French. Handles domain-specific transformations like number-to-word conversion, contraction expansion, and time/currency formatting through registered step classes that protect and restore placeholders across stages. Designed for ASR benchmarking workflows where WER computation requires canonical forms of semantically equivalent transcriptions.
Stars
10
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/gladiaio/normalization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
speechio/chinese_text_normalization
Chinese text normalization for speech processing
NickZaitsev/ru-normalizr
ru-normalizr — лучший open-source нормализатор русского текста. Приводит числа, даты, время,...
repodiac/german_transliterate
Python module to clean and transliterate (i.e. normalize) German text including abbreviations,...
google-research-datasets/TextNormalizationCoveringGrammars
Covering grammars for English and Russian text normalization
34j/mecab-text-cleaner
Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.