MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

/ 100

Established

Combines Whisper with NVIDIA NeMo's voice activity detection and speaker embedding models (MarbleNet/TitaNet) to attribute transcribed text to individual speakers. Uses source separation (Demucs) for vocal extraction, CTC-forced alignment for precise timestamp correction, and punctuation-based realignment to compensate for temporal drift across segments. Outputs speaker-labeled transcriptions with segment-level timestamps, supporting configurable Whisper models and parallel inference modes for systems with sufficient VRAM.

5,437 stars.

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

5,437

Forks

500

Language

Jupyter Notebook

License

BSD-2-Clause

Compare

whisper-diarization and whisperX whisper-diarization and whisper-run whisper-diarization and whisper-v3-diarization

Related tools

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

tsmdt/whisply

💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and...

linto-ai/linto-stt

An automatic speech recognition API

jim60105/docker-whisperX

Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker...

ringger/transcribe-critic

Multi-source transcript merging inspired by textual criticism — LLM adjudicates multiple...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights