HeadTTS and StreamingKokoroJS
These are ecosystem siblings—HeadTTS provides a full-stack TTS implementation with lip-sync features and server options, while StreamingKokoroJS offers a browser-native alternative optimized for streaming inference, both building on the same underlying Kokoro model but targeting different deployment preferences.
About HeadTTS
met4citizen/HeadTTS
HeadTTS: Free neural text-to-speech (Kokoro) with timestamps and visemes for lip-sync. Runs in-browser (WebGPU/WASM) or on local Node.js WebSocket/REST server (CPU).
Leverages transformers.js with ONNX Runtime for client-side model execution, supporting both WebGPU acceleration and WASM fallback with configurable quantization levels (fp32/fp16/q8/q4). Provides phoneme-level timing data and Oculus-compatible visemes for precise lip-sync animation, with adjustable timing offsets for integration with 3D avatar frameworks like TalkingHead. Supports flexible endpoint configuration with automatic fallback between in-browser and Node.js server backends, enabling graceful degradation across browsers and deployment scenarios.
About StreamingKokoroJS
rhulha/StreamingKokoroJS
Unlimited text-to-speech in the Browser using Kokoro-JS, 100% local, 100% open source
Leverages the Kokoro-82M-v1.0-ONNX model (~300MB) with WebGPU acceleration and WASM fallback for hardware-adaptive processing, using Web Workers to prevent UI blocking during generation. Implements intelligent text chunking to stream audio chunks as they're generated, maintaining natural speech patterns across multiple voice styles at 24kHz sample rate. Supports local model loading for offline deployment while maintaining full privacy through 100% client-side inference.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work