whisperX and docker-whisperX
WhisperX is the core ASR and diarization library, while the Docker image is a containerized distribution mechanism for easier deployment—they are complements that work together, with the Dockerfile packaging the original tool for users who prefer containerized environments.
About whisperX
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Builds on OpenAI's Whisper by combining faster-whisper for batched GPU inference (70x speedup) with wav2vec2 forced phoneme alignment to achieve sub-word timing accuracy. Integrates pyannote-audio for speaker diarization and includes VAD preprocessing to reduce hallucinations while maintaining quality. Supports multiple languages with automatic language-specific alignment model selection from HuggingFace and torchaudio.
About docker-whisperX
jim60105/docker-whisperX
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker Diarization (Dockerfile, CI image build and test)
Optimizes layer caching and parallel builds to efficiently manage 175 pre-built Docker images (~10GB each) on GitHub's free runners with weekly CI updates. Provides 40+ pre-baked model variants across languages (tiny to large-v3) alongside a `no_model` tag for custom model selection, with GPU acceleration support via NVIDIA Container Toolkit.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work