kaushiknishchay/ComfyUI-Qwen3-ASR
ComfyUI nodes for Qwen3-ASR (0.6B/1.7B) and ForcedAligner. Supports high-accuracy ASR and language identification for 52 languages/dialects, including 22 Chinese dialects and various English accents. Features word-level timestamps, long audio transcription, and VRAM-optimized inference.
Integrates Qwen3-ForcedAligner as an optional secondary model to generate word-level timestamps through forced alignment rather than direct prediction. Implements chunked audio processing with configurable overlap (default 30s chunks with 2s context) to handle long recordings, plus automatic 16kHz resampling and optional FlashAttention 2 acceleration for reduced VRAM footprint across mixed precision modes (bf16/fp16/fp32).
Stars
11
Forks
3
Language
Python
License
MIT
Category
Last pushed
Mar 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/kaushiknishchay/ComfyUI-Qwen3-ASR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BoltzmannEntropy/MimikaStudio
MimikaStudio - A local-first application for macOS (Apple Silicon) + Agentic MCP Support
aahl/qwen-asr2api
🎤 Qwen 3 ASR to OpenAI API, 免费STT语音识别模型
zhao-kun/VibeVoiceFusion
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA...
gabriele-mastrapasqua/qwen3-tts
Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS....
aahl/qwen-tts2api
🗣️ Qwen TTS to OpenAI Speech API