kokoro-onnx and Kokoro-FastAPI
These are complements: the ONNX runtime implementation provides the core inference engine while the FastAPI wrapper provides a production-ready server interface for deploying that same model across different hardware backends.
About kokoro-onnx
thewh1teagle/kokoro-onnx
TTS with kokoro and onnx runtime
Leverages ONNX Runtime for CPU and GPU-accelerated inference with quantized models as small as 80MB, enabling near real-time synthesis on resource-constrained devices like M1 Macs. Supports 82+ voices across multiple languages with optional grapheme-to-phoneme conversion via the misaki package for improved pronunciation accuracy. Provides a lightweight, self-contained alternative to larger TTS systems while maintaining compatibility with standard audio output formats.
About Kokoro-FastAPI
remsky/Kokoro-FastAPI
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
Provides phoneme-level control with per-word timestamped captions and supports voice mixing via weighted combinations, enabling fine-grained audio generation and synthesis customization. Implements an OpenAI-compatible Speech API endpoint for drop-in integration with existing applications while offering a built-in web UI for standalone use. Includes Kubernetes/Helm deployment support and integrations with popular AI frameworks like SillyTavern and OpenWebUI.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work