Spaces:

nvidia
/

voice-agent-examples

Running

App Files Files Community

voice-agent-examples / examples /voice_agent_webrtc_langgraph /ui /README.md

fciannella

Working with service run on 7860

53ea588 about 1 month ago

preview code

raw

history blame

3.9 kB

Speech to Speech UI

This is a UI for ACE's speech-to-speech pipeline using WebRTC.

Requirements

Node.js
npm or a compatible package manager.

All frontend dependencies are listed in ui/package.json.

Configuration

Using Turn Server

A TURN server is needed for WebRTC connections when clients are behind NATs or firewalls that prevent direct peer-to-peer communication. The TURN server acts as a relay to ensure connectivity in restrictive network environments.

Open src/config.ts
Set RTC_CONFIG as below and update Turn Server details

export const RTC_CONFIG: ConstructorParameters<typeof RTCPeerConnection>[0] = {
    iceServers: [
      {
        urls: <turn_server_url>,
        username: <turn_server_username>,
        credential: <turn_server_credential>,
      },
    ],
  };

Save the file and restart the development server

Dynamic Prompt

Default mode (DYNAMIC_PROMPT = false): Uses preset prompts from the backend API
Dynamic mode (DYNAMIC_PROMPT = true): Allows users to input and send custom prompts via the UI

To enable dynamic prompt ingestion from the UI:

Open src/config.ts
Set DYNAMIC_PROMPT to true:

// Set to true to use dynamic prompt mode, false for default mode
export const DYNAMIC_PROMPT = true;

Save the file and restart the development server

When dynamic mode is enabled:

Users can input custom prompts directly in the UI
A "Show Prompt"/"Hide Prompt" toggle button is available
Prompts are sent to the backend for processing when the WebRTC connection is established

Run locally

npm install
npm run dev

Then browse to http://localhost:5173/.

Run in production

Create a production build:

npm install
npm run build

This creates an optimized and minified build in ./dist. The content of this folder can be served as a static website (AWS S3, Github Pages, HTTP server,...). For example:

python -m http.server --dir dist # serves the build at `localhost:8000`

See below for serving the production build on non-localhost origins.

Run outside localhost

The UI uses the user's microphone. This is only allowed in secure contexts such as http://localhost and https://<example>. If the UI is served in a non-secure context (typically: non-https URLs), the UI shows the error Cannot read properties of undefined (reading 'getUserMedia').

Below are a few options to work around this limitation:

Allowlist the URL in Chrome. This is the easiest approach, but not suitable for production, as it requires users to modify their browser settings. In Chrome, browse to chrome://flags/#unsafely-treat-insecure-origin-as-secure and enable the setting. In the textbox, add the URL origin that serves the UI. For example, if the UI is served at http://<HOST_IP>:8000/speech, enter http://<HOST_IP>:8000. Then, restart chrome and browse to the UI again.
Serve the UI from a hosting provider with HTTPS support. This is an easy approach that is suitable for production, but requires hosting the UI on a third-party provider. Examples:
Self-managed webserver with SSL. This approach is suitable for production and doesn't require a third-party hosting provider, but is harder to implement and manage. Examples:
- Ubuntu, Apache2 and Let's Encrypt
- Ubuntu, nginx and Let's Encrypt