opendilab/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
Implements full-duplex streaming S2S interactions through asynchronous multi-threaded pipelines combining ASR (Paraformer), LLM, and TTS (CosyVoice) components, with WebSocket-based real-time audio/text transport and user interruption handling. Extends the core pipeline with Subjective Action Judgement for proactive agent behavior, plus optional web search and RAG integration for grounding responses in external knowledge. Designed as a minimal, single-file reference implementation targeting researchers validating S2S architectures and LUI (Linguistic User Interface) concepts.
499 stars.
Stars
499
Forks
52
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/opendilab/CleanS2S"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
alphacep/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
huggingface/speech-to-speech
Build local voice agents with open-source models
linto-ai/WebVoiceSDK
Buildings block for voice-enabled applications in the browser
Picovoice/speech-to-text-benchmark
speech to text benchmark framework
vox-serve/vox-serve
A Streaming-Native Serving Engine for TTS/STS Models