modelscope/ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

/ 100

Emerging

Supports speech super-resolution (bandwidth extension to 48kHz), audio-visual speaker extraction conditioned on face/gesture/EEG signals, and multi-format audio input (wav, mp3, flac, opus, etc.). The toolkit provides unified inference via a NumPy-array interface for flexible pipeline integration, plus separate training modules with data generation scripts for enhancement, separation, and super-resolution tasks. Integrates with ModelScope and HuggingFace for model distribution and includes SpeechScore, a quality assessment toolkit with intrusive and non-intrusive metrics (PESQ, STOI, DNSMOS, NISQA, DISTILL_MOS).

3,962 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

3,962

Forks

325

Language

Python

License

Apache-2.0

Higher-rated alternatives

espnet/espnet

End-to-End Speech Processing Toolkit

yeyupiaoling/PPASR

基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型

flashlight/wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

yeyupiaoling/PaddlePaddle-DeepSpeech

基于PaddlePaddle实现的语音识别，中文语音识别。项目完善，识别效果好。支持Windows，Linux下训练和预测，支持Nvidia Jetson开发板预测。

philipperemy/deep-speaker

Deep Speaker: an End-to-End Neural Speaker Embedding System.

Explore Voice AI Tools

All categories Trending Voice AI directory Insights