opendilab/CleanS2S

High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！

/ 100

Emerging

Implements full-duplex streaming S2S interactions through asynchronous multi-threaded pipelines combining ASR (Paraformer), LLM, and TTS (CosyVoice) components, with WebSocket-based real-time audio/text transport and user interruption handling. Extends the core pipeline with Subjective Action Judgement for proactive agent behavior, plus optional web search and RAG integration for grounding responses in external knowledge. Designed as a minimal, single-file reference implementation targeting researchers validating S2S architectures and LUI (Linguistic User Interface) concepts.

499 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

499

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

huggingface/speech-to-speech

Build local voice agents with open-source models

linto-ai/WebVoiceSDK

Buildings block for voice-enabled applications in the browser

Picovoice/speech-to-text-benchmark

speech to text benchmark framework

vox-serve/vox-serve

A Streaming-Native Serving Engine for TTS/STS Models

Explore Voice AI Tools

All categories Trending Voice AI directory Insights