scarletcho/KoLM

Korean text normalization and language preparation package for LM in Kaldi-based ASR system

/ 100

Established

Provides morphological analysis via KoNLPy/Mecab integration and generates two granularity levels of pseudo-morphemes (micro and medium units) for flexible tokenization in language model training. The pipeline chains text normalization, character transcription (numbers, hanja, hangul jamos, alphabets), morphological tagging, and grapheme-to-phoneme conversion to produce Kaldi-compatible lexicon and corpus files, with explicit support for UTagger morphological analyzer output format.

No commits in the last 6 months. Available on PyPI.

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Related tools

daanzu/kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

nttcslab-sp/kaldiio

A pure python module for reading and writing kaldi ark files

gooofy/py-kaldi-asr

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as...

pykaldi/pykaldi

A Python wrapper for Kaldi

kaldi-asr/kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights