Fix options.md
Browse files- docs/options.md +15 -8
docs/options.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# Options
|
| 2 |
-
To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
|
|
|
|
|
|
| 3 |
|
| 4 |
For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
|
| 5 |
|
|
@@ -18,12 +20,14 @@ Select the model that Whisper will use to transcribe the audio:
|
|
| 18 |
|
| 19 |
Select the language, or leave it empty for Whisper to automatically detect it.
|
| 20 |
|
| 21 |
-
Note that if the selected language and the language in the audio differs, Whisper may start to translate the audio to the selected
|
|
|
|
| 22 |
|
| 23 |
## Inputs
|
| 24 |
The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
|
| 25 |
|
| 26 |
-
Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process
|
|
|
|
| 27 |
|
| 28 |
## Task
|
| 29 |
Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
|
|
@@ -32,14 +36,17 @@ Select the task - either "transcribe" to transcribe the audio to text, or "trans
|
|
| 32 |
* none
|
| 33 |
* Run whisper on the entire audio input
|
| 34 |
* silero-vad
|
| 35 |
-
* Use Silero VAD to detect sections that contain speech, and run whisper on independently on each section. Whisper is also run
|
|
|
|
| 36 |
* silero-vad-skip-gaps
|
| 37 |
-
* As above, but sections that doesn't contain speech according to Silero will be skipped. This will be slightly faster, but
|
|
|
|
| 38 |
* periodic-vad
|
| 39 |
-
* Create sections of speech every 'VAD - Max Merge Size' seconds. This is very fast and simple, but will potentially break
|
|
|
|
| 40 |
|
| 41 |
## VAD - Merge Window
|
| 42 |
-
If set, any adjacent speech sections that are at most this number of seconds apart will be automatically merged.
|
| 43 |
|
| 44 |
## VAD - Max Merge Size (s)
|
| 45 |
-
Disables merging of adjacent speech sections if they are this number of seconds long.
|
|
|
|
| 1 |
# Options
|
| 2 |
+
To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
|
| 3 |
+
supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
|
| 4 |
+
in the file selector to select any file type, including video files) or use the microphone.
|
| 5 |
|
| 6 |
For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
|
| 7 |
|
|
|
|
| 20 |
|
| 21 |
Select the language, or leave it empty for Whisper to automatically detect it.
|
| 22 |
|
| 23 |
+
Note that if the selected language and the language in the audio differs, Whisper may start to translate the audio to the selected
|
| 24 |
+
language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
|
| 25 |
|
| 26 |
## Inputs
|
| 27 |
The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
|
| 28 |
|
| 29 |
+
Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process
|
| 30 |
+
the URL.
|
| 31 |
|
| 32 |
## Task
|
| 33 |
Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
|
|
|
|
| 36 |
* none
|
| 37 |
* Run whisper on the entire audio input
|
| 38 |
* silero-vad
|
| 39 |
+
* Use Silero VAD to detect sections that contain speech, and run whisper on independently on each section. Whisper is also run
|
| 40 |
+
on the gaps between each speech section.
|
| 41 |
* silero-vad-skip-gaps
|
| 42 |
+
* As above, but sections that doesn't contain speech according to Silero will be skipped. This will be slightly faster, but
|
| 43 |
+
may cause dialogue to be skipped.
|
| 44 |
* periodic-vad
|
| 45 |
+
* Create sections of speech every 'VAD - Max Merge Size' seconds. This is very fast and simple, but will potentially break
|
| 46 |
+
a sentence or word in two.
|
| 47 |
|
| 48 |
## VAD - Merge Window
|
| 49 |
+
If set, any adjacent speech sections that are at most this number of seconds apart will be automatically merged.
|
| 50 |
|
| 51 |
## VAD - Max Merge Size (s)
|
| 52 |
+
Disables merging of adjacent speech sections if they are this number of seconds long.
|