Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Voice Agent WebRTC + LangGraph (Quick Start)
This example launches a complete voice agent stack:
- LangGraph dev server for local agents
- Pipecat-based speech pipeline (WebRTC, ASR, LLM adapter, TTS)
- Static UI you can open in a browser
1) Mandatory environment variables
Create .env next to this README (or copy from env.example) and set at least:
- NVIDIA_API_KEYor- RIVA_API_KEY: required for NVIDIA NIM-hosted Riva ASR/TTS
- USE_LANGGRAPH=true: enable LangGraph-backed LLM
- LANGGRAPH_BASE_URL(default- http://127.0.0.1:2024)
- LANGGRAPH_ASSISTANT(default- ace-base-agent)
- USER_EMAIL(any email for routing, e.g.- test@example.com)
- LANGGRAPH_STREAM_MODE(default- values)
- LANGGRAPH_DEBUG_STREAM(default- true)
Optional but commonly used:
- RIVA_ASR_LANGUAGE(default- en-US)
- RIVA_TTS_LANGUAGE(default- en-US)
- RIVA_TTS_VOICE_ID(e.g.- Magpie-ZeroShot.Female-1)
- RIVA_TTS_MODEL(e.g.- magpie_tts_ensemble-Magpie-ZeroShot)
- ZERO_SHOT_AUDIO_PROMPTif using Magpie Zero‑shot and a custom voice prompt
- ZERO_SHOT_AUDIO_PROMPT_URLto auto-download prompt on startup
- ENABLE_SPECULATIVE_SPEECH(default- true)
- TURN/Twilio for WebRTC if needed: TWILIO_ACCOUNT_SID,TWILIO_AUTH_TOKEN, orTURN_SERVER_URL,TURN_USERNAME,TURN_PASSWORD
2) What it does
- Starts LangGraph dev server to serve local agents from agents/.
- Starts the Pipecat pipeline (pipeline.py) exposing:- HTTP: http://<host>:7860(health and RTC config)
- WebSocket: ws://<host>:7860/wsfor audio and transcripts
 
- HTTP: 
- Serves the built UI at http://<host>:9000/(via the container).
By default it uses:
- ASR: NVIDIA Riva (NIM) with RIVA_API_KEYandNVIDIA_ASR_FUNCTION_ID
- LLM: LangGraph adapter streaming from the selected assistant
- TTS: NVIDIA Riva Magpie (NIM) with RIVA_API_KEYandNVIDIA_TTS_FUNCTION_ID
3) Run
Option A: Docker (recommended)
From this directory:
docker compose up --build -d
Then open http://<machine-ip>:9000/.
Chrome on http origins: enable “Insecure origins treated as secure” at chrome://flags/ and add http://<machine-ip>:9000.
Option B: Python (local)
Requires Python 3.12 and uv.
uv run pipeline.py
Then start the UI from ui/ (see ui/README.md).
4) Swap TTS providers (Magpie ⇄ ElevenLabs)
The default TTS in pipeline.py is NVIDIA Riva Magpie via NIM:
tts = RivaTTSService(
    api_key=os.getenv("RIVA_API_KEY"),
    function_id=os.getenv("NVIDIA_TTS_FUNCTION_ID", "4e813649-d5e4-4020-b2be-2b918396d19d"),
    voice_id=os.getenv("RIVA_TTS_VOICE_ID", "Magpie-ZeroShot.Female-1"),
    model=os.getenv("RIVA_TTS_MODEL", "magpie_tts_ensemble-Magpie-ZeroShot"),
    language=os.getenv("RIVA_TTS_LANGUAGE", "en-US"),
    zero_shot_audio_prompt_file=(
        Path(os.getenv("ZERO_SHOT_AUDIO_PROMPT")) if os.getenv("ZERO_SHOT_AUDIO_PROMPT") else None
    ),
)
To use ElevenLabs instead:
- Ensure pipecatElevenLabs dependency is available (already included via project deps).
- Set environment:- ELEVENLABS_API_KEY
- Optionally ELEVENLABS_VOICE_IDand model settings supported by ElevenLabs
 
- Change the TTS construction in pipeline.pyto useElevenLabsTTSServiceWithEndOfSpeech:
from nvidia_pipecat.services.elevenlabs import ElevenLabsTTSServiceWithEndOfSpeech
# Replace RivaTTSService(...) with:
tts = ElevenLabsTTSServiceWithEndOfSpeech(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id=os.getenv("ELEVENLABS_VOICE_ID", "Rachel"),
    sample_rate=16000,
    channels=1,
)
That’s it. No other pipeline changes are required. The transcript synchronization already supports ElevenLabs end‑of‑speech events.
Notes for Magpie Zero‑shot:
- Provide RIVA_TTS_VOICE_IDlikeMagpie-ZeroShot.Female-1andRIVA_TTS_MODELlikemagpie_tts_ensemble-Magpie-ZeroShot.
- If using a custom voice prompt, mount it via docker-compose.ymland setZERO_SHOT_AUDIO_PROMPT. You can also setZERO_SHOT_AUDIO_PROMPT_URLto auto-download at startup.
5) Troubleshooting
- Healthcheck: curl -f http://localhost:7860/get_prompt
- If UI can’t access mic on http, use Chrome flag above or host UI via HTTPS.
- For NAT/firewall issues, configure TURN or Twilio credentials.

