asiff00/On-Device-Speech-to-Speech-Conversational-AI
This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.
Implements a modular multi-threaded pipeline combining Pyannote VAD, Whisper speech recognition, Ollama/LM Studio language models, and Kokoro TTS—all locally on CPU. Uses priority-based text chunking that streams TTS output as language model tokens arrive, and strategically injects filler words into LLM prompts to reduce perceived latency by 50-70%. Achieves ~1.5s end-to-end response time on mid-range hardware through queue-based inter-component coordination and user interrupt handling.
242 stars.
Stars
242
Forks
43
Language
Python
License
MIT
Category
Last pushed
Nov 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/asiff00/On-Device-Speech-to-Speech-Conversational-AI"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
VideotronicMaker/LM-Studio-Voice-Conversation
Python app for LM Studio-enhanced voice conversations with local LLMs. Uses Whisper for...
syntithenai/hermod
voice services stack from audio hardware through hotword, ASR, NLU, AI routing and TTS bound by...
voice-engine/make-a-smart-speaker
A collection of resources to make a smart speaker
FR33TR1ST/VoiceAssistant
A VoiceAsistant with WhisperAI speech recognition
bold-ronin/lira
A Voice-First AI Companion