WhisperLive and WhisperSpeech

These are complementary tools that form a bidirectional speech processing pipeline: WhisperLive enables real-time speech-to-text conversion while WhisperSpeech enables text-to-speech synthesis, allowing audio content to be transcribed and regenerated within a single workflow.

WhisperLive

Established

WhisperSpeech

Established

Maintenance 20/25

Adoption 10/25

Maturity 16/25

Community 22/25

Maintenance 6/25

Adoption 10/25

Maturity 16/25

Community 18/25

Stars: 3,894

Forks: 536

Downloads: —

Commits (30d): 13

Language: Python

License: MIT

Stars: 4,575

Forks: 269

Downloads: —

Commits (30d): 0

Language: Jupyter Notebook

License: MIT

No Package No Dependents

About WhisperLive

collabora/WhisperLive

A nearly-live implementation of OpenAI's Whisper.

Supports multiple inference backends (Faster-Whisper, TensorRT-LLM, and OpenVINO) for optimized performance across different hardware, with pluggable model sizes and a client-server architecture for concurrent transcription. Features Voice Activity Detection, real-time translation between any languages, and OpenAI-compatible REST API endpoints alongside native WebSocket streaming for low-latency audio input from microphones or files.

About WhisperSpeech

WhisperSpeech/WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Uses a two-stage token-based pipeline that maps text → semantic tokens (via Whisper) → acoustic tokens (via EnCodec) → waveform (via Vocos vocoder), enabling voice cloning and multilingual code-switching. Achieves 12× real-time inference on consumer GPUs through `torch.compile` and KV-caching, with models trained exclusively on properly licensed data for commercial viability.

Related comparisons

WhisperLive and whisper-ctranslate2 WhisperLive and whisper-clip WhisperLive and whisper_mic WhisperLive and whisper-speech-to-text

Scores updated daily from GitHub, PyPI, and npm data. How scores work