metadata

title: Project Charles
emoji: 👀
colorFrom: gray
colorTo: green
sdk: streamlit
python_version: 3.11
sdk_version: 1.26.0
app_file: ui_app.py
pinned: true
license: mit
models:
  - laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K

Project Charles

Toy app for voice based agent

Video Demo -> Early Test

Required Environment Variables/Keys

OPENAI_API_KEY - required for ChatGPT
ELEVENLABS_API_KEY - required for ElevenLabs TTS

Optional Environment Variables/Keys

TWILIO_ACCOUNT_SID - reduces time for WebRTC connection
TWILIO_AUTH_TOKEN - reduces time for WebRTC connection

How to install

pip install -r requirements.txt

Install packages from packages.txt

macOS (Homebrew)

xargs brew install < packages.txt

Linux (Ubuntu, apt)

sudo xargs -a packages.txt apt-get install -y

Linux (Fedora, dnf)

sudo xargs -a packages.txt dnf install -y

Windows (Chocolatey)

Get-Content packages.txt | ForEach-Object { choco install $_ -y }

How to run

streamlit run ui_app.py

Known Issues

First run maybe slow due to downloading of model. You may want to refresh the page after the first run.
Audio errors may occur due to the way the app converts from ElevenLabs stream to WebRTC audio
Audio error may happen if the server is running slow
May hang and server needs a hard reset

Architecture

Key Technologies:

Ray Actors & Queues - backbone of interprocess communication
Streamlit - UI & WebRTC connection
Vosk - speech to text
ChatGPT - text to text
ElevenLabs TTS - text to speech
Twilio - optional faster WebRTC connection