m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Builds on OpenAI's Whisper by combining faster-whisper for batched GPU inference (70x speedup) with wav2vec2 forced phoneme alignment to achieve sub-word timing accuracy. Integrates pyannote-audio for speaker diarization and includes VAD preprocessing to reduce hallucinations while maintaining quality. Supports multiple languages with automatic language-specific alignment model selection from HuggingFace and torchaudio.
20,758 stars and 864,629 monthly downloads. Used by 5 other packages. Actively maintained with 15 commits in the last 30 days. Available on PyPI.
Stars
20,758
Forks
2,188
Language
Python
License
BSD-2-Clause
Category
Last pushed
Mar 17, 2026
Monthly downloads
864,629
Commits (30d)
15
Dependencies
12
Reverse dependents
5
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/m-bain/whisperX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Related tools
tsmdt/whisply
💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and...
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
linto-ai/linto-stt
An automatic speech recognition API
jim60105/docker-whisperX
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker...
ringger/transcribe-critic
Multi-source transcript merging inspired by textual criticism — LLM adjudicates multiple...