speechio/BigCiDian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Builds a unified IPA-based phoneset mapping both languages' phonological systems, enabling seamless pronunciation handling for code-switched speech (e.g., English brand names embedded in Mandarin). Derives English entries from CMUDict and Chinese from DaCiDian, preserving Chinese tonal information (0-4) while normalizing English to a toneless representation across 56 total phonemes. Integrates with Kaldi ASR frameworks and demonstrates modest WER improvements on AISHELL-2 Mandarin benchmarks while enabling true multilingual recognition capabilities.
262 stars. No commits in the last 6 months.
Stars
262
Forks
55
Language
Python
License
—
Category
Last pushed
Oct 11, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/speechio/BigCiDian"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
apluka34/Bud500
Bud500: A Comprehensive Vietnamese ASR Dataset
qianchang/zici
字词:收集国学/汉语字词拼音相关资源
gheyret/UQSpeechDataset
Uyghur Single Speaker Speech Dataset. ウイグル語音声データセット
harisbinzia/PronouncUR
PronouncUR: An Urdu Pronunciation Lexicon Generator
jonsafari/buckeye_dict
Buckeye Pronunciation Dictionary