whisperX and whisply
WhisperX provides the underlying speech recognition and diarization engine with word-level timestamps, while Whisply is a higher-level application layer that wraps Whisper (and potentially WhisperX) to deliver batch processing and user interface functionality—making them complements rather than direct competitors.
About whisperX
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Builds on OpenAI's Whisper by combining faster-whisper for batched GPU inference (70x speedup) with wav2vec2 forced phoneme alignment to achieve sub-word timing accuracy. Integrates pyannote-audio for speaker diarization and includes VAD preprocessing to reduce hallucinations while maintaining quality. Supports multiple languages with automatic language-specific alignment model selection from HuggingFace and torchaudio.
About whisply
tsmdt/whisply
💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and subtitle generation using OpenAI’s Whisper on CPU, Nvidia GPU and Apple MLX.
Leverages hardware-specific Whisper implementations (`faster-whisper` for CPUs/Nvidia, `mlx-whisper` for Apple Silicon) with automatic device detection, plus integrates `whisperX` and `pyannote` for word-level speaker diarization and customizable subtitle generation. Supports multiple export formats (JSON, SRT, VTT, HTML, RTTM) and batch processing via CLI, browser app, or config files for scalable transcription workflows.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work