foduucom
/

speaker-segmentation-eng

speaker-diarization

speaker-segmentation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

nehulagrawal commited on May 9, 2024

Commit

019fd18

·

verified ·

1 Parent(s): 8c6f372

Update README.md

Files changed (1) hide show

README.md +70 -1

README.md CHANGED Viewed

@@ -25,8 +25,77 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
 More information needed

 ## Model description
+This segmentation model has been trained on English data (Callhome) using diarizers. It can be loaded with two lines of code:
+```python
+from diarizers import SegmentationModel
+segmentation_model = SegmentationModel().from_pretrained('diarizers-community/speaker-segmentation-fine-tuned-callhome-jpn')
+```
+To use it within a pyannote speaker diarization pipeline, load the [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline, and convert the model to a pyannote compatible format:
+```python
+from diarizers import SegmentationModel
+from pyannote.audio import Pipeline
+from datasets import load_dataset
+import torch
+device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
+# load the pre-trained pyannote pipeline
+pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
+pipeline.to(device)
+model = SegmentationModel().from_pretrained("nehulagrawal/speaker-segmentation-eng")
+model = model.to_pyannote_model()
+pipeline._segmentation.model = model.to(device)
+```
+You can now use the pipeline on audio examples:
+```python
+from datasets import load_dataset
+# load dataset example
+dataset = load_dataset("diarizers-community/callhome", "eng", split="data")
+sample = dataset[0]["audio"]
+# pre-process inputs
+sample["waveform"] = torch.from_numpy(sample.pop("array")[None, :]).to(device, dtype=model.dtype)
+sample["sample_rate"] = sample.pop("sampling_rate")
+# perform inference
+diarization = pipeline(sample)
+# dump the diarization output to disk using RTTM format
+with open("audio.rttm", "w") as rttm:
+    diarization.write_rttm(rttm)
+```
+You can now use the pipeline on single audio examples:
+```python
+from diarizers import SegmentationModel
+from pyannote.audio import Pipeline
+from datasets import load_dataset
+import torch
+device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")
+# load the pre-trained pyannote pipeline
+pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1")
+pipeline.to(device)
+model = SegmentationModel().from_pretrained("nehulagrawal/speaker-segmentation-eng")
+model = model.to_pyannote_model()
+pipeline._segmentation.model = model.to(device)
+diarization = pipeline("audio.wav")
+with open("audio.rttm", "w") as rttm:
+    diarization.write_rttm(rttm)
+```
 ## Intended uses & limitations
 More information needed