m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

/ 100

Verified

Builds on OpenAI's Whisper by combining faster-whisper for batched GPU inference (70x speedup) with wav2vec2 forced phoneme alignment to achieve sub-word timing accuracy. Integrates pyannote-audio for speaker diarization and includes VAD preprocessing to reduce hallucinations while maintaining quality. Supports multiple languages with automatic language-specific alignment model selection from HuggingFace and torchaudio.

20,758 stars and 864,629 monthly downloads. Used by 5 other packages. Actively maintained with 15 commits in the last 30 days. Available on PyPI.

Maintenance 20 / 25

Adoption 25 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

20,758

Forks

2,188

Language

Python

License

BSD-2-Clause

Featured in

Things AI Won't Tell You About Building a Voice App Choosing a Voice AI Library in 2026: What's Actually Worth Building On

Compare

whisperX and whisply whisperX and whisper-diarization whisperX and docker-whisperX whisperX and whisper-run whisperX and CrisperWhisper whisperX and whisperVideo whisperX and whisper-v3-diarization

Related tools

tsmdt/whisply

💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and...

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

linto-ai/linto-stt

An automatic speech recognition API

jim60105/docker-whisperX

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker...

ringger/transcribe-critic

Multi-source transcript merging inspired by textual criticism — LLM adjudicates multiple...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights