aadnk commited on
Commit
a95d6a8
1 Parent(s): 084aa80

Fix options.md

Browse files
Files changed (1) hide show
  1. docs/options.md +15 -8
docs/options.md CHANGED
@@ -1,5 +1,7 @@
1
  # Options
2
- To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md) supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)" in the file selector to select any file type, including video files) or use the microphone.
 
 
3
 
4
  For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
5
 
@@ -18,12 +20,14 @@ Select the model that Whisper will use to transcribe the audio:
18
 
19
  Select the language, or leave it empty for Whisper to automatically detect it.
20
 
21
- Note that if the selected language and the language in the audio differs, Whisper may start to translate the audio to the selected language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
 
22
 
23
  ## Inputs
24
  The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
25
 
26
- Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process the URL.
 
27
 
28
  ## Task
29
  Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
@@ -32,14 +36,17 @@ Select the task - either "transcribe" to transcribe the audio to text, or "trans
32
  * none
33
  * Run whisper on the entire audio input
34
  * silero-vad
35
- * Use Silero VAD to detect sections that contain speech, and run whisper on independently on each section. Whisper is also run on the gaps between each speech section.
 
36
  * silero-vad-skip-gaps
37
- * As above, but sections that doesn't contain speech according to Silero will be skipped. This will be slightly faster, but may cause dialogue to be skipped.
 
38
  * periodic-vad
39
- * Create sections of speech every 'VAD - Max Merge Size' seconds. This is very fast and simple, but will potentially break a sentence or word in two.
 
40
 
41
  ## VAD - Merge Window
42
- If set, any adjacent speech sections that are at most this number of seconds apart will be automatically merged."
43
 
44
  ## VAD - Max Merge Size (s)
45
- Disables merging of adjacent speech sections if they are this number of seconds long."
 
1
  # Options
2
+ To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
3
+ supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
4
+ in the file selector to select any file type, including video files) or use the microphone.
5
 
6
  For longer audio files (>10 minutes), it is recommended that you select Silero VAD (Voice Activity Detector) in the VAD option.
7
 
 
20
 
21
  Select the language, or leave it empty for Whisper to automatically detect it.
22
 
23
+ Note that if the selected language and the language in the audio differs, Whisper may start to translate the audio to the selected
24
+ language. For instance, if the audio is in English but you select Japaneese, the model may translate the audio to Japanese.
25
 
26
  ## Inputs
27
  The options "URL (YouTube, etc.)", "Upload Audio" or "Micriphone Input" allows you to send an audio input to the model.
28
 
29
+ Note that the UI will only process the first valid input - i.e. if you enter both an URL and upload an audio, it will only process
30
+ the URL.
31
 
32
  ## Task
33
  Select the task - either "transcribe" to transcribe the audio to text, or "translate" to translate it to English.
 
36
  * none
37
  * Run whisper on the entire audio input
38
  * silero-vad
39
+ * Use Silero VAD to detect sections that contain speech, and run whisper on independently on each section. Whisper is also run
40
+ on the gaps between each speech section.
41
  * silero-vad-skip-gaps
42
+ * As above, but sections that doesn't contain speech according to Silero will be skipped. This will be slightly faster, but
43
+ may cause dialogue to be skipped.
44
  * periodic-vad
45
+ * Create sections of speech every 'VAD - Max Merge Size' seconds. This is very fast and simple, but will potentially break
46
+ a sentence or word in two.
47
 
48
  ## VAD - Merge Window
49
+ If set, any adjacent speech sections that are at most this number of seconds apart will be automatically merged.
50
 
51
  ## VAD - Max Merge Size (s)
52
+ Disables merging of adjacent speech sections if they are this number of seconds long.