project_charles / README.md
sohojoe's picture
refactor app.py to ui_app.py
f4e57e4
|
raw
history blame
No virus
1.85 kB
---
title: Project Charles
emoji: πŸ‘€
colorFrom: gray
colorTo: green
sdk: streamlit
python_version: 3.11
sdk_version: 1.26.0
app_file: ui_app.py
pinned: true
license: mit
models: ["laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K"]
---
# Project Charles
Toy app for voice based agent
Video Demo -> [Early Test](https://www.linkedin.com/posts/sohojoe_ray-vosk-chatgpt-activity-7100365711226671104-c2Nv?utm_source=share&utm_medium)
## Required Environment Variables/Keys
* OPENAI_API_KEY - required for ChatGPT
* ELEVENLABS_API_KEY - required for ElevenLabs TTS
## Optional Environment Variables/Keys
* TWILIO_ACCOUNT_SID - reduces time for WebRTC connection
* TWILIO_AUTH_TOKEN - reduces time for WebRTC connection
## How to install
```bash
pip install -r requirements.txt
```
Install packages from packages.txt
macOS (Homebrew)
```bash
xargs brew install < packages.txt
```
Linux (Ubuntu, apt)
```bash
sudo xargs -a packages.txt apt-get install -y
```
Linux (Fedora, dnf)
```bash
sudo xargs -a packages.txt dnf install -y
```
Windows (Chocolatey)
```bash
Get-Content packages.txt | ForEach-Object { choco install $_ -y }
```
## How to run
```bash
streamlit run ui_app.py
```
## Known Issues
* First run maybe slow due to downloading of model. You may want to refresh the page after the first run.
* Audio errors may occur due to the way the app converts from ElevenLabs stream to WebRTC audio
* Audio error may happen if the server is running slow
* May hang and server needs a hard reset
## Architecture
![Image of the architecture](./images/ProjectCharlesCommunicationArchitecture.jpg)
Key Technologies:
* Ray Actors & Queues - backbone of interprocess communication
* Streamlit - UI & WebRTC connection
* Vosk - speech to text
* ChatGPT - text to text
* ElevenLabs TTS - text to speech
* Twilio - optional faster WebRTC connection