project_charles / README.md
sohojoe's picture
add Architecture diagram
c950514
|
raw
history blame
1.8 kB
metadata
title: Project Charles
emoji: πŸ‘€
colorFrom: gray
colorTo: green
sdk: streamlit
python_version: 3.9.16
sdk_version: 1.22.0
app_file: app.py
pinned: false
license: mit

Project Charles

Toy app for voice based agent

Video Demo -> Early Test

Required Environment Variables/Keys

  • OPENAI_API_KEY - required for ChatGPT
  • ELEVENLABS_API_KEY - required for ElevenLabs TTS

Optional Environment Variables/Keys

  • TWILIO_ACCOUNT_SID - reduces time for WebRTC connection
  • TWILIO_AUTH_TOKEN - reduces time for WebRTC connection

How to install

pip install -r requirements.txt

Install packages from packages.txt

macOS (Homebrew)

xargs brew install < packages.txt

Linux (Ubuntu, apt)

sudo xargs -a packages.txt apt-get install -y

Linux (Fedora, dnf)

sudo xargs -a packages.txt dnf install -y

Windows (Chocolatey)

Get-Content packages.txt | ForEach-Object { choco install $_ -y }

How to run

streamlit run app.py

Known Issues

  • First run maybe slow due to downloading of model. You may want to refresh the page after the first run.
  • Audio errors may occur due to the way the app converts from ElevenLabs stream to WebRTC audio
  • Audio error may happen if the server is running slow
  • May hang and server needs a hard reset

Architecture

Image of the architecture

Key Technologies:

  • Ray Actors & Queues - backbone of interprocess communication
  • Streamlit - UI & WebRTC connection
  • Vosk - speech to text
  • ChatGPT - text to text
  • ElevenLabs TTS - text to speech
  • Twilio - optional faster WebRTC connection