MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Combines Whisper with NVIDIA NeMo's voice activity detection and speaker embedding models (MarbleNet/TitaNet) to attribute transcribed text to individual speakers. Uses source separation (Demucs) for vocal extraction, CTC-forced alignment for precise timestamp correction, and punctuation-based realignment to compensate for temporal drift across segments. Outputs speaker-labeled transcriptions with segment-level timestamps, supporting configurable Whisper models and parallel inference modes for systems with sufficient VRAM.
5,437 stars.
Stars
5,437
Forks
500
Language
Jupyter Notebook
License
BSD-2-Clause
Category
Last pushed
Feb 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/MahmoudAshraf97/whisper-diarization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
tsmdt/whisply
💬 Fast, cross-platform CLI and GUI for batch transcription, translation, speaker annotation and...
linto-ai/linto-stt
An automatic speech recognition API
jim60105/docker-whisperX
Dockerfile for WhisperX: Automatic Speech Recognition with Word-Level Timestamps and Speaker...
ringger/transcribe-critic
Multi-source transcript merging inspired by textual criticism — LLM adjudicates multiple...