speechio/chinese_text_normalization

Chinese text normalization for speech processing

/ 100

Established

Handles multiple non-standard word (NSW) categories—cardinals, dates, fractions, money, percentages, and phone numbers—converting them to spoken forms for ASR pipelines. Built with regex-based normalizers in Python and finite-state grammars in Thrax, it supports multiple input formats (plain text, Kaldi archives, TSV tables) and includes punctuation removal with language-specific rules. Designed specifically for Chinese speech processing workflows rather than as a generic NLP framework.

722 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

722

Forks

150

Language

Python

License

MIT

Related tools

NickZaitsev/ru-normalizr

ru-normalizr — лучший open-source нормализатор русского текста. Приводит числа, даты, время,...

repodiac/german_transliterate

Python module to clean and transliterate (i.e. normalize) German text including abbreviations,...

gladiaio/normalization

A lightweight library for normalizing speech transcripts before computing WER

google-research-datasets/TextNormalizationCoveringGrammars

Covering grammars for English and Russian text normalization

34j/mecab-text-cleaner

Simple Python package (CLI/Python API) for getting japanese readings (yomigana) and accents using MeCab.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights