File size: 1,796 Bytes
9d382ff
 
 
 
 
 
e369e39
73d75a1
9d382ff
 
 
 
 
dd6a36a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c950514
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: Project Charles
emoji: 👀
colorFrom: gray
colorTo: green
sdk: streamlit
python_version: 3.9.16
sdk_version: 1.22.0
app_file: app.py
pinned: false
license: mit
---

# Project Charles

Toy app for voice based agent

Video Demo -> [Early Test](https://www.linkedin.com/posts/sohojoe_ray-vosk-chatgpt-activity-7100365711226671104-c2Nv?utm_source=share&utm_medium)

## Required Environment Variables/Keys

* OPENAI_API_KEY - required for ChatGPT
* ELEVENLABS_API_KEY - required for ElevenLabs TTS

## Optional Environment Variables/Keys

* TWILIO_ACCOUNT_SID - reduces time for WebRTC connection
* TWILIO_AUTH_TOKEN - reduces time for WebRTC connection

## How to install

```bash
pip install -r requirements.txt
```

Install packages from packages.txt

macOS (Homebrew)
```bash 
xargs brew install < packages.txt
```

Linux (Ubuntu, apt)
```bash
sudo xargs -a packages.txt apt-get install -y
```

Linux (Fedora, dnf)
```bash
sudo xargs -a packages.txt dnf install -y
```

Windows (Chocolatey)
```bash
Get-Content packages.txt | ForEach-Object { choco install $_ -y }
```



## How to run

```bash
streamlit run app.py
```

## Known Issues

* First run maybe slow due to downloading of model. You may want to refresh the page after the first run.
* Audio errors may occur due to the way the app converts from ElevenLabs stream to WebRTC audio
* Audio error may happen if the server is running slow
* May hang and server needs a hard reset

## Architecture

![Image of the architecture](./images/ProjectCharlesCommunicationArchitecture.jpg)

Key Technologies:

* Ray Actors & Queues - backbone of interprocess communication
* Streamlit - UI & WebRTC connection
* Vosk - speech to text
* ChatGPT - text to text
* ElevenLabs TTS - text to speech
* Twilio - optional faster WebRTC connection