speechio/BigCiDian

Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.

40
/ 100
Emerging

Builds a unified IPA-based phoneset mapping both languages' phonological systems, enabling seamless pronunciation handling for code-switched speech (e.g., English brand names embedded in Mandarin). Derives English entries from CMUDict and Chinese from DaCiDian, preserving Chinese tonal information (0-4) while normalizing English to a toneless representation across 56 total phonemes. Integrates with Kaldi ASR frameworks and demonstrates modest WER improvements on AISHELL-2 Mandarin benchmarks while enabling true multilingual recognition capabilities.

262 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 22 / 25

How are scores calculated?

Stars

262

Forks

55

Language

Python

License

Last pushed

Oct 11, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/speechio/BigCiDian"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.