aadnk commited on
Commit
dd6f129
1 Parent(s): fdb8dbd

Add documentation for additional options

Browse files
Files changed (1) hide show
  1. docs/options.md +50 -2
docs/options.md CHANGED
@@ -1,4 +1,4 @@
1
- # Options
2
  To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
3
  supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
4
  in the file selector to select any file type, including video files) or use the microphone.
@@ -83,4 +83,52 @@ Note that detected lines in gaps between speech sections will not be included in
83
  # Command Line Options
84
 
85
  Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
86
- CPU/GPU cores, the default model name/VAD and so on. Consult the README in the root folder for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Standard Options
2
  To transcribe or translate an audio file, you can either copy an URL from a website (all [websites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md)
3
  supported by YT-DLP will work, including YouTube). Otherwise, upload an audio file (choose "All Files (*.*)"
4
  in the file selector to select any file type, including video files) or use the microphone.
83
  # Command Line Options
84
 
85
  Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
86
+ CPU/GPU cores, the default model name/VAD and so on. Consult the README in the root folder for more information.
87
+
88
+ # Additional Options
89
+
90
+ In addition to the above, there's also a "Full" options interface that allows you to set all the options available in the Whisper
91
+ model. The options are as follows:
92
+
93
+ ## Initial Prompt
94
+ Optional text to provide as a prompt for the first 30 seconds window. Whisper will attempt to use this as a starting point for the transcription, but you can
95
+ also get creative and specify a style or format for the output of the transcription.
96
+
97
+ For instance, if you use the prompt "hello how is it going always use lowercase no punctuation goodbye one two three start stop i you me they", Whisper will
98
+ be biased to output lower capital letters and no punctuation, and may also be biased to output the words in the prompt more often.
99
+
100
+ ## Temperature
101
+ The temperature to use when sampling. Default is 0 (zero). A higher temperature will result in more random output, while a lower temperature will be more deterministic.
102
+
103
+ ## Best Of - Non-zero temperature
104
+ The number of candidates to sample from when sampling with non-zero temperature. Default is 5.
105
+
106
+ ## Beam Size - Zero temperature
107
+ The number of beams to use in beam search when sampling with zero temperature. Default is 5.
108
+
109
+ ## Patience - Zero temperature
110
+ The patience value to use in beam search when sampling with zero temperature. As in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search.
111
+
112
+ ## Length Penalty - Any temperature
113
+ The token length penalty coefficient (alpha) to use when sampling with any temperature. As in https://arxiv.org/abs/1609.08144, uses simple length normalization by default.
114
+
115
+ ## Suppress Tokens - Comma-separated list of token IDs
116
+ A comma-separated list of token IDs to suppress during sampling. The default value of "-1" will suppress most special characters except common punctuations.
117
+
118
+ ## Condition on previous text
119
+ If True, provide the previous output of the model as a prompt for the next window. Disabling this may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop.
120
+
121
+ ## FP16
122
+ Whether to perform inference in fp16. True by default.
123
+
124
+ ## Temperature increment on fallback
125
+ The temperature to increase when falling back when the decoding fails to meet either of the thresholds below. Default is 0.2.
126
+
127
+ ## Compression ratio threshold
128
+ If the gzip compression ratio is higher than this value, treat the decoding as failed. Default is 2.4.
129
+
130
+ ## Logprob threshold
131
+ If the average log probability is lower than this value, treat the decoding as failed. Default is -1.0.
132
+
133
+ ## No speech threshold
134
+ If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.