kokoro-onnx and kokoclone
The ONNX runtime implementation provides the core inference engine that enables the voice cloning tool to generate and manipulate speech efficiently, making them complements that work together in a production pipeline.
About kokoro-onnx
thewh1teagle/kokoro-onnx
TTS with kokoro and onnx runtime
Leverages ONNX Runtime for CPU and GPU-accelerated inference with quantized models as small as 80MB, enabling near real-time synthesis on resource-constrained devices like M1 Macs. Supports 82+ voices across multiple languages with optional grapheme-to-phoneme conversion via the misaki package for improved pronunciation accuracy. Provides a lightweight, self-contained alternative to larger TTS systems while maintaining compatibility with standard audio output formats.
About kokoclone
Ashish-Patnaik/kokoclone
Voice Cloning, Now Inside Kokoro. Generate natural multilingual speech and clone any target voice with ease.
Combines Kokoro-ONNX TTS with the Kanade voice-conversion model to enable both text-to-speech cloning and audio-to-audio re-voicing without transcription. Features VRAM-aware chunking with RoPE-ceiling enforcement for processing long recordings while respecting Transformer positional embedding limits, plus automatic hardware detection for CPU/GPU optimization. Supports 8 languages through a unified API layer exposed via Gradio web UI, CLI, and Python SDK.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work