tensorflow-ctc-speech-recognition and ctc-asr
These are competitors, as both are independent, end-to-end speech recognition systems implementing CTC with RNNs.
About tensorflow-ctc-speech-recognition
philipperemy/tensorflow-ctc-speech-recognition
Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).
Uses LSTM networks with CTC loss to decode speech directly to text, trained and evaluated on the VCTK Corpus with configurable batch sizes and network architectures. Extracts audio features via librosa and python_speech_features, then feeds spectrograms through recurrent layers followed by CTC decoding to handle variable-length audio-text alignment without explicit frame-level annotations. Demonstrates end-to-end training on single-speaker subsets, showing reasonable generalization despite limited data through techniques like random silence truncation for realistic validation.
About ctc-asr
mdangschat/ctc-asr
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Implements bidirectional RNN layers with dense layers trained on 900+ hours of multi-corpus audio data (LibriSpeech, Common Voice, TEDLIUM, Tatoeba), achieving 12.6% WER without external language models. Built on TensorFlow with configurable architecture parameters, supporting GPU acceleration and modular training/evaluation workflows via CSV-based corpus definitions. Includes utilities for multi-corpus preparation, checkpoint management, and real-time training visualization through TensorBoard.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work