asiff00/On-Device-Speech-to-Speech-Conversational-AI

This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.

/ 100

Established

Implements a modular multi-threaded pipeline combining Pyannote VAD, Whisper speech recognition, Ollama/LM Studio language models, and Kokoro TTS—all locally on CPU. Uses priority-based text chunking that streams TTS output as language model tokens arrive, and strategically injects filler words into LLM prompts to reduce perceived latency by 50-70%. Achieves ~1.5s end-to-end response time on mid-range hardware through queue-based inter-component coordination and user interrupt handling.

242 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

242

Forks

Language

Python

License

MIT

Related tools

VideotronicMaker/LM-Studio-Voice-Conversation

Python app for LM Studio-enhanced voice conversations with local LLMs. Uses Whisper for...

syntithenai/hermod

voice services stack from audio hardware through hotword, ASR, NLU, AI routing and TTS bound by...

voice-engine/make-a-smart-speaker

A collection of resources to make a smart speaker

FR33TR1ST/VoiceAssistant

A VoiceAsistant with WhisperAI speech recognition

bold-ronin/lira

A Voice-First AI Companion

Explore Voice AI Tools

All categories Trending Voice AI directory Insights