jpc commited on
Commit
ad3d1f7
1 Parent(s): 3ec0fd4

Automate Docker building

Browse files
README.md CHANGED
@@ -1,29 +1,52 @@
1
  # WhisperBot
2
- Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive](https://github.com/collabora/WhisperLive) and [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
 
 
 
 
 
 
 
 
 
3
 
4
  ## Features
5
- - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
6
 
7
- - **Large Language Model Integration**: Adds Mistral, a Large Language Model, to enhance the understanding and context of the transcribed text.
 
 
 
 
 
8
 
9
- - **TensorRT Optimization**: Both Mistral and Whisper are optimized to run as TensorRT engines, ensuring high-performance and low-latency processing.
 
 
10
 
11
  ## Prerequisites
12
- Install [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md) to build Whisper and Mistral TensorRT engines. The README builds a docker image for TensorRT-LLM.
13
- Instead of building a docker image, we can also refer to the README and the [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) to install the required packages in the base pytroch docker image. Just make sure to use the correct base image as mentioned in the dockerfile and everything should go nice and smooth.
 
 
 
 
 
 
 
 
14
 
15
  ### Build Whisper TensorRT Engine
16
 
17
  > [!NOTE]
18
  >
19
- > These steps are included in `setup/setup-tensorrt-llm.sh`
20
 
21
  Change working dir to the [whisper example
22
  dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
23
  in TensorRT-LLM.
24
 
25
  ``` bash
26
- cd TensorRT-LLM/examples/whisper
27
  ```
28
 
29
  Currently, by default TensorRT-LLM only supports `large-v2` and
@@ -33,7 +56,7 @@ Download the required assets
33
 
34
  ``` bash
35
  # the sound filter definitions
36
- wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
37
  # the small.en model weights
38
  wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
39
  ```
@@ -62,15 +85,23 @@ model:
62
  ``` bash
63
  pip install -r requirements.txt
64
  python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
 
 
65
  ```
66
 
67
  ### Build Mistral TensorRT Engine
68
- - Change working dir to [llama example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama) in TensorRT-LLM folder.
69
- ```bash
70
- cd TensorRT-LLM/examples/llama
 
 
 
 
71
  ```
72
- - Convert Mistral to `fp16` TensorRT engine.
73
- ```bash
 
 
74
  python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
75
  --dtype float16 \
76
  --remove_input_padding \
@@ -78,20 +109,30 @@ python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
78
  --enable_context_fmha \
79
  --use_gemm_plugin float16 \
80
  --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
81
- --max_input_len 5000
82
  --max_batch_size 1
 
 
83
  ```
84
 
85
  ### Build Phi TensorRT Engine
86
- Note: Phi is only available in main branch and hasnt been released yet. So, make sure to build TensorRT-LLM from main branch.
87
- - Change working dir to [phi example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/phi) in TensorRT-LLM folder.
88
- ```bash
89
- cd TensorRT-LLM/examples/phi
 
 
 
 
 
 
90
  ```
91
- - Build phi TensorRT engine
92
- ```bash
 
 
93
  git lfs install
94
- git clone https://huggingface.co/microsoft/phi-2
95
  python3 build.py --dtype=float16 \
96
  --log_level=verbose \
97
  --use_gpt_attention_plugin float16 \
@@ -99,46 +140,95 @@ python3 build.py --dtype=float16 \
99
  --max_batch_size=16 \
100
  --max_input_len=1024 \
101
  --max_output_len=1024 \
102
- --output_dir=phi_engine \
103
- --model_dir=phi-2>&1 | tee build.log
 
 
 
 
 
104
  ```
105
 
106
- ## Run WhisperBot
107
- - Clone this repo and install requirements.
108
- ```bash
109
- git clone https://github.com/collabora/WhisperBot.git
 
 
 
 
 
 
110
  cd WhisperBot
111
  apt update
112
  apt install ffmpeg portaudio19-dev -y
 
 
 
 
 
 
 
 
 
 
 
113
  pip install -r requirements.txt
 
114
  ```
115
 
116
- ### Whisper + Mistral
117
- - Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Mistral TensorRT from the build phase. If a huggingface model is used to build mistral then just use the huggingface repo name as the tokenizer path.
118
- ```bash
119
- python3 main.py --mistral
120
- --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
121
- --mistral_tensorrt_path /root/TensorRT-LLM/examples/llama/tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
122
- --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
 
 
 
 
 
123
  ```
124
 
125
- ### Whisper + Phi
126
- - Take the folder path for Whisper TensorRT model, folder_path and tokenizer_path for Phi TensorRT from the build phase. If a huggingface model is used to build phi then just use the huggingface repo name as the tokenizer path.
127
- ```bash
128
- python3 main.py --phi
129
- --whisper_tensorrt_path /root/TensorRT-LLM/examples/whisper/whisper_small_en \
130
- --phi_tensorrt_path /root/TensorRT-LLM/examples/phi/phi_engine \
131
- --phi_tokenizer_path /root/TensorRT-LLM/examples/phi/phi-2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ```
133
 
134
- - On the client side clone the repo, install the requirements and execute `run_client.py`
135
- ```bash
 
 
136
  cd WhisperBot
137
  pip install -r requirements.txt
138
  python3 run_client.py
139
  ```
140
 
141
-
142
  ## Contact Us
143
- For questions or issues, please open an issue.
144
- Contact us at: marcus.edel@collabora.com, jpc@collabora.com, vineet.suryan@collabora.com
 
 
 
1
  # WhisperBot
2
+
3
+
4
+ Welcome to WhisperBot. WhisperBot builds upon the capabilities of the
5
+ [WhisperLive](https://github.com/collabora/WhisperLive) and
6
+ [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by
7
+ integrating Mistral, a Large Language Model (LLM), on top of the
8
+ real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper,
9
+ a powerful automatic speech recognition (ASR) system. Both Mistral and
10
+ Whisper are optimized to run efficiently as TensorRT engines, maximizing
11
+ performance and real-time processing capabilities.
12
 
13
  ## Features
 
14
 
15
+ - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert
16
+ spoken language into text in real-time.
17
+
18
+ - **Large Language Model Integration**: Adds Mistral, a Large Language
19
+ Model, to enhance the understanding and context of the transcribed
20
+ text.
21
 
22
+ - **TensorRT Optimization**: Both Mistral and Whisper are optimized to
23
+ run as TensorRT engines, ensuring high-performance and low-latency
24
+ processing.
25
 
26
  ## Prerequisites
27
+
28
+ Install
29
+ [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md)
30
+ to build Whisper and Mistral TensorRT engines. The README builds a
31
+ docker image for TensorRT-LLM. Instead of building a docker image, we
32
+ can also refer to the README and the
33
+ [Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi)
34
+ to install the required packages in the base pytroch docker image. Just
35
+ make sure to use the correct base image as mentioned in the dockerfile
36
+ and everything should go nice and smooth.
37
 
38
  ### Build Whisper TensorRT Engine
39
 
40
  > [!NOTE]
41
  >
42
+ > These steps are included in `docker/scripts/build-whisper.sh`
43
 
44
  Change working dir to the [whisper example
45
  dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper)
46
  in TensorRT-LLM.
47
 
48
  ``` bash
49
+ cd /root/TensorRT-LLM-examples/whisper
50
  ```
51
 
52
  Currently, by default TensorRT-LLM only supports `large-v2` and
 
56
 
57
  ``` bash
58
  # the sound filter definitions
59
+ wget --directory-prefix=assets https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
60
  # the small.en model weights
61
  wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
62
  ```
 
85
  ``` bash
86
  pip install -r requirements.txt
87
  python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --use_bert_attention_plugin --model_name small.en
88
+ mkdir -p /root/scratch-space/models
89
+ cp -r whisper_small_en /root/scratch-space/models
90
  ```
91
 
92
  ### Build Mistral TensorRT Engine
93
+
94
+ > [!NOTE]
95
+ >
96
+ > These steps are included in `docker/scripts/build-mistral.sh`
97
+
98
+ ``` bash
99
+ cd /root/TensorRT-LLM-examples/llama
100
  ```
101
+
102
+ Build TensorRT for Mistral with `fp16`
103
+
104
+ ``` bash
105
  python build.py --model_dir teknium/OpenHermes-2.5-Mistral-7B \
106
  --dtype float16 \
107
  --remove_input_padding \
 
109
  --enable_context_fmha \
110
  --use_gemm_plugin float16 \
111
  --output_dir ./tmp/mistral/7B/trt_engines/fp16/1-gpu/ \
112
+ --max_input_len 5000 \
113
  --max_batch_size 1
114
+ mkdir -p /root/scratch-space/models
115
+ cp -r tmp/mistral/7B/trt_engines/fp16/1-gpu /root/scratch-space/models/mistral
116
  ```
117
 
118
  ### Build Phi TensorRT Engine
119
+
120
+ > [!NOTE]
121
+ >
122
+ > These steps are included in `docker/scripts/build-phi-2.sh`
123
+
124
+ Note: Phi is only available in main branch and hasnt been released yet.
125
+ So, make sure to build TensorRT-LLM from main branch.
126
+
127
+ ``` bash
128
+ cd /root/TensorRT-LLM-examples/phi
129
  ```
130
+
131
+ Build TensorRT for Phi-2 with `fp16`
132
+
133
+ ``` bash
134
  git lfs install
135
+ phi_path=$(huggingface-cli download --repo-type model --revision 834565c23f9b28b96ccbeabe614dd906b6db551a microsoft/phi-2)
136
  python3 build.py --dtype=float16 \
137
  --log_level=verbose \
138
  --use_gpt_attention_plugin float16 \
 
140
  --max_batch_size=16 \
141
  --max_input_len=1024 \
142
  --max_output_len=1024 \
143
+ --output_dir=phi-2 \
144
+ --model_dir="$phi_path" >&1 | tee build.log
145
+ dest=/root/scratch-space/models
146
+ mkdir -p "$dest/phi-2/tokenizer"
147
+ cp -r phi-2 "$dest"
148
+ (cd "$phi_path" && cp config.json tokenizer_config.json vocab.json merges.txt "$dest/phi-2/tokenizer")
149
+ cp -r "$phi_path" "$dest/phi-orig-model"
150
  ```
151
 
152
+ ## Build WhisperBot
153
+
154
+ > [!NOTE]
155
+ >
156
+ > These steps are included in `docker/scripts/setup-whisperbot.sh`
157
+
158
+ Clone this repo and install requirements
159
+
160
+ ``` bash
161
+ [ -d "WhisperBot" ] || git clone https://github.com/collabora/WhisperBot.git
162
  cd WhisperBot
163
  apt update
164
  apt install ffmpeg portaudio19-dev -y
165
+ ```
166
+
167
+ Install torchaudio matching the PyTorch from the base image
168
+
169
+ ``` bash
170
+ pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
171
+ ```
172
+
173
+ Install all the other dependencies normally
174
+
175
+ ``` bash
176
  pip install -r requirements.txt
177
+ pip install openai-whisper whisperspeech soundfile
178
  ```
179
 
180
+ force update huggingface_hub (tokenizers 0.14.1 spuriously require and
181
+ ancient \<=0.18 version)
182
+
183
+ ``` bash
184
+ pip install -U huggingface_hub
185
+ huggingface-cli download collabora/whisperspeech t2s-small-en+pl.model s2a-q4-tiny-en+pl.model
186
+ huggingface-cli download charactr/vocos-encodec-24khz
187
+ mkdir -p /root/.cache/torch/hub/checkpoints/
188
+ curl -L -o /root/.cache/torch/hub/checkpoints/encodec_24khz-d7cc33bc.th https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
189
+ mkdir -p /root/.cache/whisper-live/
190
+ curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
191
+ python -c 'from transformers.utils.hub import move_cache; move_cache()'
192
  ```
193
 
194
+ ### Run WhisperBot with Whisper and Mistral/Phi-2
195
+
196
+ Take the folder path for Whisper TensorRT model, folder_path and
197
+ tokenizer_path for Mistral/Phi-2 TensorRT from the build phase. If a
198
+ huggingface model is used to build mistral/phi-2 then just use the
199
+ huggingface repo name as the tokenizer path.
200
+
201
+ > [!NOTE]
202
+ >
203
+ > These steps are included in `docker/scripts/run-whisperbot.sh`
204
+
205
+ ``` bash
206
+ test -f /etc/shinit_v2 && source /etc/shinit_v2
207
+ cd WhisperBot
208
+ if [ "$1" != "mistral" ]; then
209
+ exec python3 main.py --phi \
210
+ --whisper_tensorrt_path /root/whisper_small_en \
211
+ --phi_tensorrt_path /root/phi-2 \
212
+ --phi_tokenizer_path /root/phi-2
213
+ else
214
+ exec python3 main.py --mistral \
215
+ --whisper_tensorrt_path /root/models/whisper_small_en \
216
+ --mistral_tensorrt_path /root/models/mistral \
217
+ --mistral_tokenizer_path teknium/OpenHermes-2.5-Mistral-7B
218
+ fi
219
  ```
220
 
221
+ - On the client side clone the repo, install the requirements and
222
+ execute `run_client.py`
223
+
224
+ ``` bash
225
  cd WhisperBot
226
  pip install -r requirements.txt
227
  python3 run_client.py
228
  ```
229
 
 
230
  ## Contact Us
231
+
232
+ For questions or issues, please open an issue. Contact us at:
233
+ marcus.edel@collabora.com, jpc@collabora.com,
234
+ vineet.suryan@collabora.com
README.qmd CHANGED
@@ -29,7 +29,7 @@ These steps are included in `{fname}`
29
 
30
  # WhisperBot
31
 
32
- Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive]() by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
33
 
34
  ## Features
35
  - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
@@ -45,19 +45,19 @@ Instead of building a docker image, we can also refer to the README and the [Doc
45
  ### Build Whisper TensorRT Engine
46
 
47
  ```{python}
48
- include_file('docker/scripts/setup-whisper.sh')
49
  ```
50
 
51
  ### Build Mistral TensorRT Engine
52
 
53
  ```{python}
54
- include_file('docker/scripts/setup-mistral.sh')
55
  ```
56
 
57
  ### Build Phi TensorRT Engine
58
 
59
  ```{python}
60
- include_file('docker/scripts/setup-phi-2.sh')
61
  ```
62
 
63
  ## Build WhisperBot
 
29
 
30
  # WhisperBot
31
 
32
+ Welcome to WhisperBot. WhisperBot builds upon the capabilities of the [WhisperLive](https://github.com/collabora/WhisperLive) and [WhisperSpeech](https://github.com/collabora/WhisperSpeech) by integrating Mistral, a Large Language Model (LLM), on top of the real-time speech-to-text pipeline. WhisperLive relies on OpenAI Whisper, a powerful automatic speech recognition (ASR) system. Both Mistral and Whisper are optimized to run efficiently as TensorRT engines, maximizing performance and real-time processing capabilities.
33
 
34
  ## Features
35
  - **Real-Time Speech-to-Text**: Utilizes OpenAI WhisperLive to convert spoken language into text in real-time.
 
45
  ### Build Whisper TensorRT Engine
46
 
47
  ```{python}
48
+ include_file('docker/scripts/build-whisper.sh')
49
  ```
50
 
51
  ### Build Mistral TensorRT Engine
52
 
53
  ```{python}
54
+ include_file('docker/scripts/build-mistral.sh')
55
  ```
56
 
57
  ### Build Phi TensorRT Engine
58
 
59
  ```{python}
60
+ include_file('docker/scripts/build-phi-2.sh')
61
  ```
62
 
63
  ## Build WhisperBot
docker/scripts/setup-whisperbot.sh CHANGED
@@ -7,15 +7,11 @@ cd WhisperBot
7
  apt update
8
  apt install ffmpeg portaudio19-dev -y
9
 
10
- ## NVidia containers are based on unreleased PyTorch versions so we have to manually install
11
- ## torchaudio from source (`pip install torchaudio` would pull all new PyTorch and CUDA versions)
12
- #apt install -y cmake
13
- #TORCH_CUDA_ARCH_LIST="8.9 9.0" pip install --no-build-isolation git+https://github.com/pytorch/audio.git
14
 
15
  ## Install all the other dependencies normally
16
- pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
17
  pip install -r requirements.txt
18
- pip install openai-whisper whisperspeech soundfile
19
 
20
  ## force update huggingface_hub (tokenizers 0.14.1 spuriously require and ancient <=0.18 version)
21
  pip install -U huggingface_hub
@@ -29,4 +25,3 @@ mkdir -p /root/.cache/whisper-live/
29
  curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
30
 
31
  python -c 'from transformers.utils.hub import move_cache; move_cache()'
32
-
 
7
  apt update
8
  apt install ffmpeg portaudio19-dev -y
9
 
10
+ ## Install torchaudio matching the PyTorch from the base image
11
+ pip install --extra-index-url https://download.pytorch.org/whl/cu121 torchaudio
 
 
12
 
13
  ## Install all the other dependencies normally
 
14
  pip install -r requirements.txt
 
15
 
16
  ## force update huggingface_hub (tokenizers 0.14.1 spuriously require and ancient <=0.18 version)
17
  pip install -U huggingface_hub
 
25
  curl -L -o /root/.cache/whisper-live/silero_vad.onnx https://github.com/snakers4/silero-vad/raw/master/files/silero_vad.onnx
26
 
27
  python -c 'from transformers.utils.hub import move_cache; move_cache()'
 
docker/scripts/setup.sh ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ #!/bin/bash -e
2
+
3
+ ./setup-whisper.sh
4
+ #./setup-mistral.sh
5
+ ./setup-phi-2.sh
6
+ ./setup-whisperbot.sh
requirements.txt CHANGED
@@ -6,4 +6,7 @@ scipy
6
  websocket-client
7
  tiktoken==0.3.3
8
  kaldialign
9
- braceexpand
 
 
 
 
6
  websocket-client
7
  tiktoken==0.3.3
8
  kaldialign
9
+ braceexpand
10
+ openai-whisper
11
+ whisperspeech
12
+ soundfile