objects76 commited on
Commit
d6695c1
1 Parent(s): f5ed452

source: ./README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -21,7 +21,7 @@ extra_gated_fields:
21
  Website: text
22
  ---
23
 
24
- Using this open-source model in production?
25
  Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
26
 
27
  # 🎹 "Powerset" speaker segmentation
@@ -33,7 +33,7 @@ This model ingests 10 seconds of mono audio sampled at 16kHz and outputs speaker
33
  ```python
34
  # waveform (first row)
35
  duration, sample_rate, num_channels = 10, 16000, 1
36
- waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
37
 
38
  # powerset multi-class encoding (second row)
39
  powerset_encoding = model(waveform)
@@ -42,7 +42,7 @@ powerset_encoding = model(waveform)
42
  from pyannote.audio.utils.powerset import Powerset
43
  max_speakers_per_chunk, max_speakers_per_frame = 3, 2
44
  to_multilabel = Powerset(
45
- max_speakers_per_chunk,
46
  max_speakers_per_frame).to_multilabel
47
  multilabel_encoding = to_multilabel(powerset_encoding)
48
  ```
@@ -66,13 +66,13 @@ This [companion repository](https://github.com/FrenchKrab/IS2023-powerset-diariz
66
  # instantiate the model
67
  from pyannote.audio import Model
68
  model = Model.from_pretrained(
69
- "pyannote/segmentation-3.0",
70
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
71
  ```
72
 
73
  ### Speaker diarization
74
 
75
- This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
76
 
77
  See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
78
 
 
21
  Website: text
22
  ---
23
 
24
+ Using this open-source model in production?
25
  Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
26
 
27
  # 🎹 "Powerset" speaker segmentation
 
33
  ```python
34
  # waveform (first row)
35
  duration, sample_rate, num_channels = 10, 16000, 1
36
+ waveform = torch.randn(batch_size, num_channels, duration * sample_rate)
37
 
38
  # powerset multi-class encoding (second row)
39
  powerset_encoding = model(waveform)
 
42
  from pyannote.audio.utils.powerset import Powerset
43
  max_speakers_per_chunk, max_speakers_per_frame = 3, 2
44
  to_multilabel = Powerset(
45
+ max_speakers_per_chunk,
46
  max_speakers_per_frame).to_multilabel
47
  multilabel_encoding = to_multilabel(powerset_encoding)
48
  ```
 
66
  # instantiate the model
67
  from pyannote.audio import Model
68
  model = Model.from_pretrained(
69
+ "pyannote/segmentation-3.0",
70
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
71
  ```
72
 
73
  ### Speaker diarization
74
 
75
+ This model cannot be used to perform speaker diarization of full recordings on its own (it only processes 10s chunks).
76
 
77
  See [pyannote/speaker-diarization-3.0](https://hf.co/pyannote/speaker-diarization-3.0) pipeline that uses an additional speaker embedding model to perform full recording speaker diarization.
78