scarletcho/KoLM
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
Provides morphological analysis via KoNLPy/Mecab integration and generates two granularity levels of pseudo-morphemes (micro and medium units) for flexible tokenization in language model training. The pipeline chains text normalization, character transcription (numbers, hanja, hangul jamos, alphabets), morphological tagging, and grapheme-to-phoneme conversion to produce Kaldi-compatible lexicon and corpus files, with explicit support for UTagger morphological analyzer output format.
No commits in the last 6 months. Available on PyPI.
Stars
63
Forks
21
Language
Python
License
GPL-3.0
Category
Last pushed
Apr 23, 2020
Monthly downloads
12
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/scarletcho/KoLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
daanzu/kaldi-active-grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
nttcslab-sp/kaldiio
A pure python module for reading and writing kaldi ark files
gooofy/py-kaldi-asr
Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as...
pykaldi/pykaldi
A Python wrapper for Kaldi
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.