clovaai/ClovaCall

ClovaCall dataset and Pytorch LAS baseline code (Interspeech 2020)

/ 100

Emerging

Listen-Attend-and-Tell (LAS) seq2seq architecture with convolutional feature extraction (2D convolution + bidirectional LSTM encoder) and attention-based decoder for Korean goal-oriented dialog ASR. The dataset contains 112K utterance-transcript pairs from contact center interactions in restaurant reservation domain, with both raw and silence-eliminated versions totaling ~125 hours of multi-speaker Korean speech. Code supports PyTorch-based training with data augmentation techniques (noise augmentation, SpecAugment) and evaluation metrics using character error rate (CER).

223 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

223

Forks

Language

Python

License

MIT

Higher-rated alternatives

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2....

xinjli/allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

dangvansam/viet-asr

VietASR - Vietnamese Automatic Speech Recognition

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

srvk/eesen

The official repository of the Eesen project

Explore Voice AI Tools

All categories Trending Voice AI directory Insights