VOICE · REAL-TIME AGENT

Chinese real-time voice assistant, in production.

End-to-end STT → LLM → TTS at ~600 ms TTFA, with barge-in and multi-turn context. Built on Aliyun ASR + Qwen + CosyVoice TTS. Already deployed across 3 verticals.

LIVE DEMO

Talk to it directly in your browser.

All three verticals are clickable — restaurant, salon, clinic. Real pipeline, not a mock.

Try the voice demo
~ 600 ms TTFA (Time To First Audio)
< 200 ms Barge-in response
zh · yue · en Languages
99.5%+ Production recognition accuracy
DEPLOYED SCENARIOS · 3

One foundation, three live verticals.

Each scenario shipped 0-to-production in 4–6 weeks. Give us a new vertical + your domain corpus, and our FDE team replicates the same cadence.

TECHNICAL STACK

What's underneath.

STT
Aliyun Paraformer (real-time)Mandarin + Cantonese + English mix
LLM
Qwen-Max (default)Swap to GPT / Claude / domesticScenario prompt library + tool calling
TTS
Aliyun CosyVoiceVoice selection · emotion
TRANSPORT
LiveKit Cloud AgentWebRTC + SSE edge token
YOUR SCENARIO

Want this pipeline on your own business?

Common adaptations: support hotlines · outbound sales · post-sale follow-up · industry advisory hotline. Our FDE team ships a new vertical in 4–6 weeks.