ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Built on a unified encoder-decoder architecture trained with multi-task learning, it handles ASR, speech-to-text translation, speech-to-speech translation, and TTS in both offline and low-latency streaming modes using fairseq. The model generates intermediate results incrementally during simultaneous translation, enabling real-time ASR transcriptions and translation outputs before final speech synthesis completes.
1,252 stars. No commits in the last 6 months.
Stars
1,252
Forks
102
Language
Python
License
MIT
Category
Last pushed
Jun 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/ictnlp/StreamSpeech"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
speechmatics/speechmatics-python
Python library and CLI for Speechmatics
gooofy/py-nltools
A collection of basic python modules for spoken natural language processing
IBM/MAX-Speech-to-Text-Converter
Converts spoken words into text form.
snakers4/open_stt
Open STT
verbio-technologies/python-verbio-speech-center
Python integration with the Verbio Speech Center Cloud. https://speechcenter.verbio.com/