Spaces:
Running
Running
title: Whisper Webui | |
emoji: ⚡ | |
colorFrom: pink | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 3.23.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
# Running Locally | |
To run this program locally, first install Python 3.9+ and Git. Then install Pytorch 10.1+ and all the other dependencies: | |
``` | |
pip install -r requirements.txt | |
``` | |
You can find detailed instructions for how to install this on Windows 10/11 [here (PDF)](docs/windows/install_win10_win11.pdf). | |
Finally, run the full version (no audio length restrictions) of the app with parallel CPU/GPU enabled: | |
``` | |
python app.py --input_audio_max_duration -1 --server_name 127.0.0.1 --auto_parallel True | |
``` | |
You can also run the CLI interface, which is similar to Whisper's own CLI but also supports the following additional arguments: | |
``` | |
python cli.py \ | |
[--vad {none,silero-vad,silero-vad-skip-gaps,silero-vad-expand-into-gaps,periodic-vad}] \ | |
[--vad_merge_window VAD_MERGE_WINDOW] \ | |
[--vad_max_merge_size VAD_MAX_MERGE_SIZE] \ | |
[--vad_padding VAD_PADDING] \ | |
[--vad_prompt_window VAD_PROMPT_WINDOW] | |
[--vad_cpu_cores NUMBER_OF_CORES] | |
[--vad_parallel_devices COMMA_DELIMITED_DEVICES] | |
[--auto_parallel BOOLEAN] | |
``` | |
In addition, you may also use URL's in addition to file paths as input. | |
``` | |
python cli.py --model large --vad silero-vad --language Japanese "https://www.youtube.com/watch?v=4cICErqqRSM" | |
``` | |
Rather than supplying arguments to `app.py` or `cli.py`, you can also use the configuration file [config.json5](config.json5). See that file for more information. | |
If you want to use a different configuration file, you can use the `WHISPER_WEBUI_CONFIG` environment variable to specify the path to another file. | |
### Multiple Files | |
You can upload multiple files either through the "Upload files" option, or as a playlist on YouTube. | |
Each audio file will then be processed in turn, and the resulting SRT/VTT/Transcript will be made available in the "Download" section. | |
When more than one file is processed, the UI will also generate a "All_Output" zip file containing all the text output files. | |
## Diarization | |
To detect different speakers in the audio, you can use the [whisper-diarization](https://gitlab.com/aadnk/whisper-diarization) application. | |
Download the JSON file after running Whisper on an audio file, and then run app.py in the | |
whisper-diarization repository with the audio file and the JSON file as arguments. | |
## Whisper Implementation | |
You can choose between using `whisper` or `faster-whisper`. [Faster Whisper](https://github.com/guillaumekln/faster-whisper) as a drop-in replacement for the | |
default Whisper which achieves up to a 4x speedup and 2x reduction in memory usage. | |
You can install the requirements for a specific Whisper implementation in `requirements-fasterWhisper.txt` | |
or `requirements-whisper.txt`: | |
``` | |
pip install -r requirements-fasterWhisper.txt | |
``` | |
And then run the App or the CLI with the `--whisper_implementation faster-whisper` flag: | |
``` | |
python app.py --whisper_implementation faster-whisper --input_audio_max_duration -1 --server_name 127.0.0.1 --auto_parallel True | |
``` | |
You can also select the whisper implementation in `config.json5`: | |
```json5 | |
{ | |
"whisper_implementation": "faster-whisper" | |
} | |
``` | |
### GPU Acceleration | |
In order to use GPU acceleration with Faster Whisper, both CUDA 11.2 and cuDNN 8 must be installed. You may want to install it in a virtual environment like Anaconda. | |
## Google Colab | |
You can also run this Web UI directly on [Google Colab](https://colab.research.google.com/drive/1qeTSvi7Bt_5RMm88ipW4fkcsMOKlDDss?usp=sharing), if you haven't got a GPU powerful enough to run the larger models. | |
See the [colab documentation](docs/colab.md) for more information. | |
## Parallel Execution | |
You can also run both the Web-UI or the CLI on multiple GPUs in parallel, using the `vad_parallel_devices` option. This takes a comma-delimited list of | |
device IDs (0, 1, etc.) that Whisper should be distributed to and run on concurrently: | |
``` | |
python cli.py --model large --vad silero-vad --language Japanese \ | |
--vad_parallel_devices 0,1 "https://www.youtube.com/watch?v=4cICErqqRSM" | |
``` | |
Note that this requires a VAD to function properly, otherwise only the first GPU will be used. Though you could use `period-vad` to avoid taking the hit | |
of running Silero-Vad, at a slight cost to accuracy. | |
This is achieved by creating N child processes (where N is the number of selected devices), where Whisper is run concurrently. In `app.py`, you can also | |
set the `vad_process_timeout` option. This configures the number of seconds until a process is killed due to inactivity, freeing RAM and video memory. | |
The default value is 30 minutes. | |
``` | |
python app.py --input_audio_max_duration -1 --vad_parallel_devices 0,1 --vad_process_timeout 3600 | |
``` | |
To execute the Silero VAD itself in parallel, use the `vad_cpu_cores` option: | |
``` | |
python app.py --input_audio_max_duration -1 --vad_parallel_devices 0,1 --vad_process_timeout 3600 --vad_cpu_cores 4 | |
``` | |
You may also use `vad_process_timeout` with a single device (`--vad_parallel_devices 0`), if you prefer to always free video memory after a period of time. | |
### Auto Parallel | |
You can also set `auto_parallel` to `True`. This will set `vad_parallel_devices` to use all the GPU devices on the system, and `vad_cpu_cores` to be equal to the number of | |
cores (up to 8): | |
``` | |
python app.py --input_audio_max_duration -1 --auto_parallel True | |
``` | |
# Docker | |
To run it in Docker, first install Docker and optionally the NVIDIA Container Toolkit in order to use the GPU. | |
Then either use the GitLab hosted container below, or check out this repository and build an image: | |
``` | |
sudo docker build -t whisper-webui:1 . | |
``` | |
You can then start the WebUI with GPU support like so: | |
``` | |
sudo docker run -d --gpus=all -p 7860:7860 whisper-webui:1 | |
``` | |
Leave out "--gpus=all" if you don't have access to a GPU with enough memory, and are fine with running it on the CPU only: | |
``` | |
sudo docker run -d -p 7860:7860 whisper-webui:1 | |
``` | |
# GitLab Docker Registry | |
This Docker container is also hosted on GitLab: | |
``` | |
sudo docker run -d --gpus=all -p 7860:7860 registry.gitlab.com/aadnk/whisper-webui:latest | |
``` | |
## Custom Arguments | |
You can also pass custom arguments to `app.py` in the Docker container, for instance to be able to use all the GPUs in parallel (replace administrator with your user): | |
``` | |
sudo docker run -d --gpus all -p 7860:7860 \ | |
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \ | |
--mount type=bind,source=/home/administrator/.cache/huggingface,target=/root/.cache/huggingface \ | |
--restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest \ | |
app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True \ | |
--default_vad silero-vad --default_model_name large | |
``` | |
You can also call `cli.py` the same way: | |
``` | |
sudo docker run --gpus all \ | |
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \ | |
--mount type=bind,source=/home/administrator/.cache/huggingface,target=/root/.cache/huggingface \ | |
--mount type=bind,source=${PWD},target=/app/data \ | |
registry.gitlab.com/aadnk/whisper-webui:latest \ | |
cli.py --model large --auto_parallel True --vad silero-vad \ | |
--output_dir /app/data /app/data/YOUR-FILE-HERE.mp4 | |
``` | |
## Caching | |
Note that the models themselves are currently not included in the Docker images, and will be downloaded on the demand. | |
To avoid this, bind the directory /root/.cache/whisper to some directory on the host (for instance /home/administrator/.cache/whisper), where you can (optionally) | |
prepopulate the directory with the different Whisper models. | |
``` | |
sudo docker run -d --gpus=all -p 7860:7860 \ | |
--mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \ | |
registry.gitlab.com/aadnk/whisper-webui:latest | |
``` |