FunASR and Fun-ASR
These are competing projects that both provide end-to-end ASR systems with similar core functionality, though FunASR from ModelScope appears to be the more established toolkit while Fun-ASR from FunAudioLLM integrates LLM capabilities for potentially richer speech understanding.
About FunASR
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Built on non-autoregressive architectures like Paraformer, FunASR combines ASR with complementary tasks—VAD, punctuation restoration, speaker diarization, and emotion recognition—within a unified framework. The toolkit integrates with ModelScope and Hugging Face for model distribution, and provides production-ready runtime services with optimized CPU/GPU inference pipelines supporting both offline batch processing and low-latency streaming transcription across 31+ languages.
About Fun-ASR
FunAudioLLM/Fun-ASR
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
Supports 31 languages with specialized optimization for dialects and accents (7 Chinese dialects, 26 regional accents), enabling low-latency real-time transcription via an end-to-end architecture trained on tens of millions of hours of speech data. Features include VAD integration, punctuation restoration, hotword customization, and robust performance in far-field/high-noise scenarios (93% accuracy). Integrates with ModelScope and Hugging Face ecosystems through the `funasr` toolkit, supporting inference via `AutoModel` or direct model loading with configurable language, ITN (inverse text normalization), and batch processing.
Scores updated daily from GitHub, PyPI, and npm data. How scores work