aadnk commited on
Commit
70d1342
1 Parent(s): 32b246d

Add documentation for Diarization

Browse files
Files changed (1) hide show
  1. docs/options.md +19 -0
docs/options.md CHANGED
@@ -80,6 +80,17 @@ number of seconds after the line has finished. For instance, if a line ends at 1
80
  Note that detected lines in gaps between speech sections will not be included in the prompt
81
  (if silero-vad or silero-vad-expand-into-gaps) is used.
82
 
 
 
 
 
 
 
 
 
 
 
 
83
  # Command Line Options
84
 
85
  Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
@@ -132,3 +143,11 @@ If the average log probability is lower than this value, treat the decoding as f
132
 
133
  ## No speech threshold
134
  If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
 
 
 
 
 
 
 
 
 
80
  Note that detected lines in gaps between speech sections will not be included in the prompt
81
  (if silero-vad or silero-vad-expand-into-gaps) is used.
82
 
83
+ ## Diarization
84
+
85
+ If checked, Pyannote will be used to detect speakers in the audio, and label them as (SPEAKER 00), (SPEAKER 01), etc.
86
+
87
+ This requires a HuggingFace API key to function, which can be supplied with the `--auth_token` command line option for the CLI,
88
+ set in the `config.json5` file for the GUI, or provided via the `HK_AUTH_TOKEN` environment variable.
89
+
90
+ ## Diarization - Speakers
91
+
92
+ The number of speakers to detect. If set to 0, Pyannote will attempt to detect the number of speakers automatically.
93
+
94
  # Command Line Options
95
 
96
  Both `app.py` and `cli.py` also accept command line options, such as the ability to enable parallel execution on multiple
 
143
 
144
  ## No speech threshold
145
  If the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default is 0.6.
146
+
147
+ ## Diarization - Min Speakers
148
+
149
+ The minimum number of speakers for Pyannote to detect.
150
+
151
+ ## Diarization - Max Speakers
152
+
153
+ The maximum number of speakers for Pyannote to detect.