WhisperLive and WhisperSpeech
These are complementary tools that form a bidirectional speech processing pipeline: WhisperLive enables real-time speech-to-text conversion while WhisperSpeech enables text-to-speech synthesis, allowing audio content to be transcribed and regenerated within a single workflow.
About WhisperLive
collabora/WhisperLive
A nearly-live implementation of OpenAI's Whisper.
Supports multiple inference backends (Faster-Whisper, TensorRT-LLM, and OpenVINO) for optimized performance across different hardware, with pluggable model sizes and a client-server architecture for concurrent transcription. Features Voice Activity Detection, real-time translation between any languages, and OpenAI-compatible REST API endpoints alongside native WebSocket streaming for low-latency audio input from microphones or files.
About WhisperSpeech
WhisperSpeech/WhisperSpeech
An Open Source text-to-speech system built by inverting Whisper.
Uses a two-stage token-based pipeline that maps text → semantic tokens (via Whisper) → acoustic tokens (via EnCodec) → waveform (via Vocos vocoder), enabling voice cloning and multilingual code-switching. Achieves 12× real-time inference on consumer GPUs through `torch.compile` and KV-caching, with models trained exclusively on properly licensed data for commercial viability.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work