microsoft/SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

/ 100

Emerging

Implements an encoder-decoder transformer architecture with unified masked prediction across speech and text modalities, leveraging unpaired data to improve representation learning. Supports diverse downstream tasks including ASR, speech synthesis, speech translation, and multilingual processing through variants like Speech2C, SpeechLM, and VioLA. Integrates with HuggingFace and provides pre-trained checkpoints scaled from base to large models trained on LibriSpeech and Libri-Light corpora.

1,435 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

1,435

Forks

135

Language

Python

License

MIT

Higher-rated alternatives

Spr-Aachen/Easy-Voice-Toolkit

A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.

ftyers/commonvoice-utils

Linguistic processing for Common Voice

alphacep/awesome-russian-speech

Russian speech technology links

microsoft/UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

PrzemyslawSwiderski/python-gradle-plugin

Gradle plugin to run Python projects.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights