HeadTTS and kokoro-tts-addon
One is an in-browser or local Node.js WebGPU/WASM-based neural text-to-speech engine providing timestamps and visemes, while the other is a local neural TTS addon specifically designed for browsers, suggesting they are **complementary** in providing offline, browser-based Kokoro TTS capabilities, potentially with the addon integrating or utilizing the engine.
About HeadTTS
met4citizen/HeadTTS
HeadTTS: Free neural text-to-speech (Kokoro) with timestamps and visemes for lip-sync. Runs in-browser (WebGPU/WASM) or on local Node.js WebSocket/REST server (CPU).
Leverages transformers.js with ONNX Runtime for client-side model execution, supporting both WebGPU acceleration and WASM fallback with configurable quantization levels (fp32/fp16/q8/q4). Provides phoneme-level timing data and Oculus-compatible visemes for precise lip-sync animation, with adjustable timing offsets for integration with 3D avatar frameworks like TalkingHead. Supports flexible endpoint configuration with automatic fallback between in-browser and Node.js server backends, enabling graceful degradation across browsers and deployment scenarios.
About kokoro-tts-addon
pinguy/kokoro-tts-addon
Local neural TTS for Browsers: fast, expressive, and offline—runs on modest hardware.
Implements a Flask-based local server paired with the 82M-parameter Kokoro model, enabling multi-voice synthesis with support for nine languages and accents through a Firefox extension popup. The architecture separates the inference backend from the browser frontend via HTTP, supporting both CPU and GPU acceleration while maintaining real-time performance even on legacy hardware like 2013 Xeons.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work