kaituoxu/Speech-Transformer

A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.

/ 100

Emerging

Implements end-to-end ASR by directly mapping acoustic features to character sequences without intermediate phoneme representations. Leverages Kaldi for feature extraction and integrates with Visdom for real-time loss visualization during training. Demonstrates competitive character error rates (12.8% CER) on the AISHELL Mandarin dataset compared to LSTM and attention-based baselines.

809 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 25 / 25

How are scores calculated?

Stars

809

Forks

196

Language

Python

License

—

Higher-rated alternatives

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2....

xinjli/allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

dangvansam/viet-asr

VietASR - Vietnamese Automatic Speech Recognition

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

srvk/eesen

The official repository of the Eesen project

Explore Voice AI Tools

All categories Trending Voice AI directory Insights