FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

/ 100

Established

Combines ASR, emotion recognition, and audio event detection in a single non-autoregressive end-to-end model trained on 400k+ hours across 50+ languages. Achieves 15× faster inference than Whisper-Large while supporting timestamp generation via CTC alignment, with export options for ONNX and libtorch. Integrates with FunASR framework for streamlined deployment with multi-concurrent request handling and client SDKs (Python, C++, Java, C#).

7,691 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

7,691

Forks

708

Language

Python

License

—

Related tools

travisvn/chatterbox-tts-api

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate...

FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment...

fishaudio/Bert-VITS2

vits2 backbone with multilingual-bert

OpenMOSS/MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the...

sfortis/openai_tts

Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights