whisperX and whisper-v3-diarization
WhisperX is the underlying diarization enhancement library that whisper-v3-diarization wraps into a production-ready CLI/GUI application, making them complements designed to be used together.
About whisperX
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Builds on OpenAI's Whisper by combining faster-whisper for batched GPU inference (70x speedup) with wav2vec2 forced phoneme alignment to achieve sub-word timing accuracy. Integrates pyannote-audio for speaker diarization and includes VAD preprocessing to reduce hallucinations while maintaining quality. Supports multiple languages with automatic language-specific alignment model selection from HuggingFace and torchaudio.
About whisper-v3-diarization
TharanaBope/whisper-v3-diarization
Production-ready audio transcription & speaker diarization CLI & GUI using OpenAI Whisper and WhisperX
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work