philipperemy/tensorflow-ctc-speech-recognition

Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).

48
/ 100
Emerging

Uses LSTM networks with CTC loss to decode speech directly to text, trained and evaluated on the VCTK Corpus with configurable batch sizes and network architectures. Extracts audio features via librosa and python_speech_features, then feeds spectrograms through recurrent layers followed by CTC decoding to handle variable-length audio-text alignment without explicit frame-level annotations. Demonstrates end-to-end training on single-speaker subsets, showing reasonable generalization despite limited data through techniques like random silence truncation for realistic validation.

131 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

131

Forks

47

Language

Python

License

Apache-2.0

Last pushed

Mar 04, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/philipperemy/tensorflow-ctc-speech-recognition"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.