FireRedASR and FireRedASR2S

FireRedASR2S is a more comprehensive successor that integrates the ASR capabilities of FireRedASR with additional modules (VAD, LID, Punc) into a unified system, making them ecosystem siblings in a progression rather than true alternatives.

FireRedASR
55
Established
FireRedASR2S
46
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 19/25
Maintenance 13/25
Adoption 10/25
Maturity 11/25
Community 12/25
Stars: 1,796
Forks: 159
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 365
Forks: 20
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Package No Dependents

About FireRedASR

FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

Built on two complementary architectures—an Encoder-Adapter-LLM framework for peak performance and an Attention-based Encoder-Decoder for efficiency—FireRedASR enables end-to-end speech interaction while serving as a representation module in LLM-based systems. The framework integrates with Qwen2 for LLM variants and supports batch beam search decoding with configurable parameters (beam size, length penalties, temperature). Models are distributed via Hugging Face with Python and CLI interfaces, supporting audio up to 60s (AED) or 30s (LLM) at 16kHz.

About FireRedASR2S

FireRedTeam/FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singing ASR. FireRedVAD supports speech/singing/music in 100+ langs. FireRedLID supports 100+ langs and 20+ zh dialects. FireRedPunc supports zh and en.

Built on encoder-adapter-LLM and attention-based encoder-decoder architectures, FireRedASR2S performs end-to-end speech processing with VAD pre-filtering, language identification, ASR decoding, and punctuation restoration in a unified pipeline. The system supports both streaming and non-streaming modes, with TensorRT-LLM acceleration delivering 12.7x speedup on GPU inference. Integration with vLLM and availability on both Hugging Face and ModelScope enables flexible deployment across research and production environments.

Scores updated daily from GitHub, PyPI, and npm data. How scores work