aadnk commited on
Commit
07447cb
1 Parent(s): 530547e

Update README

Browse files
Files changed (2) hide show
  1. README.md +8 -2
  2. docs/options.md +20 -12
README.md CHANGED
@@ -76,6 +76,12 @@ cores (up to 8):
76
  python app.py --input_audio_max_duration -1 --auto_parallel True
77
  ```
78
 
 
 
 
 
 
 
79
  # Docker
80
 
81
  To run it in Docker, first install Docker and optionally the NVIDIA Container Toolkit in order to use the GPU.
@@ -109,7 +115,7 @@ You can also pass custom arguments to `app.py` in the Docker container, for inst
109
  sudo docker run -d --gpus all -p 7860:7860 \
110
  --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
111
  --restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest \
112
- app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --vad_parallel_devices 0,1 \
113
  --default_vad silero-vad --default_model_name large
114
  ```
115
 
@@ -119,7 +125,7 @@ sudo docker run --gpus all \
119
  --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
120
  --mount type=bind,source=${PWD},target=/app/data \
121
  registry.gitlab.com/aadnk/whisper-webui:latest \
122
- cli.py --model large --vad_parallel_devices 0,1 --vad silero-vad \
123
  --output_dir /app/data /app/data/YOUR-FILE-HERE.mp4
124
  ```
125
 
 
76
  python app.py --input_audio_max_duration -1 --auto_parallel True
77
  ```
78
 
79
+ ### Multiple Files
80
+
81
+ You can upload multiple files either through the "Upload files" option, or as a playlist on YouTube.
82
+ Each audio file will then be processed in turn, and the resulting SRT/VTT/Transcript will be made available in the "Download" section.
83
+ When more than one file is processed, the UI will also generate a "All_Output" zip file containing all the text output files.
84
+
85
  # Docker
86
 
87
  To run it in Docker, first install Docker and optionally the NVIDIA Container Toolkit in order to use the GPU.
 
115
  sudo docker run -d --gpus all -p 7860:7860 \
116
  --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
117
  --restart=on-failure:15 registry.gitlab.com/aadnk/whisper-webui:latest \
118
+ app.py --input_audio_max_duration -1 --server_name 0.0.0.0 --auto_parallel True \
119
  --default_vad silero-vad --default_model_name large
120
  ```
121
 
 
125
  --mount type=bind,source=/home/administrator/.cache/whisper,target=/root/.cache/whisper \
126
  --mount type=bind,source=${PWD},target=/app/data \
127
  registry.gitlab.com/aadnk/whisper-webui:latest \
128
+ cli.py --model large --auto_parallel True --vad silero-vad \
129
  --output_dir /app/data /app/data/YOUR-FILE-HERE.mp4
130
  ```
131
 
docs/options.md CHANGED
@@ -3,18 +3,19 @@ To transcribe or translate an audio file, you can either copy an URL from a webs
3
  supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
4
  in the file selector to select any file type, including video files) or use the microphone.
5
 
6
- For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
7
 
8
  ## Model
9
  Select the model that Whisper will use to transcribe the audio:
10
 
11
- | Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
12
- |--------|------------|--------------------|--------------------|---------------|----------------|
13
- | tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
14
- | base | 74 M | base.en | base | ~1 GB | ~16x |
15
- | small | 244 M | small.en | small | ~2 GB | ~6x |
16
- | medium | 769 M | medium.en | medium | ~5 GB | ~2x |
17
- | large | 1550 M | N/A | large | ~10 GB | 1x |
 
18
 
19
  ## Language
20
 
@@ -24,10 +25,12 @@ Note that if the selected language and the language in the audio differs, Whispe
24
  language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
25
 
26
  ## Inputs
27
- The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
28
 
29
- Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process
30
- the URL.
 
 
31
 
32
  ## Task
33
  Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
@@ -75,4 +78,9 @@ number of seconds after the line has finished. For instance, if a line ends at 1
75
  10:04, the line's text will be included if the prompt window is 4 seconds or more (10:04 - 10:00 = 4 seconds).
76
 
77
  Note that detected lines in gaps between speech sections will not be included in the prompt
78
- (if silero-vad or silero-vad-expand-into-gaps) is used.
 
 
 
 
 
 
3
  supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
4
  in the file selector to select any file type, including video files) or use the microphone.
5
 
6
+ For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option, especially if you are using the `large-v1` model. Note that `large-v2` is a lot more forgiving, but you may still want to use a VAD with a slightly higher "VAD - Max Merge Size (s)" (60 seconds or more).
7
 
8
  ## Model
9
  Select the model that Whisper will use to transcribe the audio:
10
 
11
+ | Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
12
+ |-----------|------------|--------------------|--------------------|---------------|----------------|
13
+ | tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
14
+ | base | 74 M | base.en | base | ~1 GB | ~16x |
15
+ | small | 244 M | small.en | small | ~2 GB | ~6x |
16
+ | medium | 769 M | medium.en | medium | ~5 GB | ~2x |
17
+ | large | 1550 M | N/A | large | ~10 GB | 1x |
18
+ | large-v2 | 1550 M | N/A | large | ~10 GB | 1x |
19
 
20
  ## Language
21
 
 
25
  language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
26
 
27
  ## Inputs
28
+ The options "URL (YouTube, etc.)", "Upload Files" or "Micriphone Input" allows you to send an audio input to the model.
29
 
30
+ ### Multiple Files
31
+ Note that the UI will only process either the given URL or the upload files (including microphone) - not both.
32
+
33
+ But you can upload multiple files either through the "Upload files" option, or as a playlist on YouTube. Each audio file will then be processed in turn, and the resulting SRT/VTT/Transcript will be made available in the "Download" section. When more than one file is processed, the UI will also generate a "All_Output" zip file containing all the text output files.
34
 
35
  ## Task
36
  Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
 
78
  10:04, the line's text will be included if the prompt window is 4 seconds or more (10:04 - 10:00 = 4 seconds).
79
 
80
  Note that detected lines in gaps between speech sections will not be included in the prompt
81
+ (if silero-vad or silero-vad-expand-into-gaps) is used.
82
+
83
+ # Command Line Options
84
+
85
+ Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
86
+ CPU/GPU cores, the default model name/VAD and so on. Consult the README in the root folder for more information.