whisper-diarization and whisper-run
These are competitors offering similar end-to-end solutions for combining Whisper ASR with speaker diarization, though the second prioritizes inference speed optimization while the first has gained significantly more community adoption.
About whisper-diarization
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Combines Whisper with NVIDIA NeMo's voice activity detection and speaker embedding models (MarbleNet/TitaNet) to attribute transcribed text to individual speakers. Uses source separation (Demucs) for vocal extraction, CTC-forced alignment for precise timestamp correction, and punctuation-based realignment to compensate for temporal drift across segments. Outputs speaker-labeled transcriptions with segment-level timestamps, supporting configurable Whisper models and parallel inference modes for systems with sufficient VRAM.
About whisper-run
gorkemkaramolla/whisper-run
Faster Whisper with Speaker Diarization
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work