ChatTTS and xiaogpt
A generative speech model can act as a complement to an AI speaker interface, providing the speech synthesis for the speaker to vocalize its responses from a large language model.
About ChatTTS
2noise/ChatTTS
A generative speech model for daily dialogue.
Based on the README, here's the technical summary: Built on a transformer architecture trained on 100,000+ hours of multilingual audio, ChatTTS enables fine-grained prosodic control through special tokens for laughter, pauses, and interjections while supporting multiple speakers via speaker embeddings. The model includes a discrete VAE encoder for zero-shot speaker inference and streaming audio generation capabilities, supporting English and Chinese with plans for additional languages.
About xiaogpt
yihong0618/xiaogpt
Play ChatGPT and other LLM with Xiaomi AI Speaker
Supports multiple LLM backends (ChatGPT, Gemini, Claude, local Llama3, etc.) with pluggable TTS engines (Edge, OpenAI, Azure, Fish Audio) and streaming response capabilities. Integrates with Xiaomi's MiService SDK to authenticate and control speakers via the Mina protocol, with optional LangChain support for web search and advanced reasoning tasks. Configuration via YAML/JSON files or CLI arguments, with streaming mode for real-time conversational responsiveness.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work