WhisperLive and WhisperSpeech

These are complementary tools that form a bidirectional speech processing pipeline: WhisperLive enables real-time speech-to-text conversion while WhisperSpeech enables text-to-speech synthesis, allowing audio content to be transcribed and regenerated within a single workflow.

WhisperLive
68
Established
WhisperSpeech
50
Established
Maintenance 20/25
Adoption 10/25
Maturity 16/25
Community 22/25
Maintenance 6/25
Adoption 10/25
Maturity 16/25
Community 18/25
Stars: 3,894
Forks: 536
Downloads:
Commits (30d): 13
Language: Python
License: MIT
Stars: 4,575
Forks: 269
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: MIT
No Package No Dependents
No Package No Dependents

About WhisperLive

collabora/WhisperLive

A nearly-live implementation of OpenAI's Whisper.

Supports multiple inference backends (Faster-Whisper, TensorRT-LLM, and OpenVINO) for optimized performance across different hardware, with pluggable model sizes and a client-server architecture for concurrent transcription. Features Voice Activity Detection, real-time translation between any languages, and OpenAI-compatible REST API endpoints alongside native WebSocket streaming for low-latency audio input from microphones or files.

About WhisperSpeech

WhisperSpeech/WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Uses a two-stage token-based pipeline that maps text → semantic tokens (via Whisper) → acoustic tokens (via EnCodec) → waveform (via Vocos vocoder), enabling voice cloning and multilingual code-switching. Achieves 12× real-time inference on consumer GPUs through `torch.compile` and KV-caching, with models trained exclusively on properly licensed data for commercial viability.

Scores updated daily from GitHub, PyPI, and npm data. How scores work