rishiraj hbredin commited on
Commit
2abb322
0 Parent(s):

Duplicate from pyannote/wespeaker-voxceleb-resnet34-LM

Browse files

Co-authored-by: Hervé Bredin <hbredin@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +111 -0
  3. config.yaml +10 -0
  4. pytorch_model.bin +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - pyannote
4
+ - pyannote-audio
5
+ - pyannote-audio-model
6
+ - wespeaker
7
+ - audio
8
+ - voice
9
+ - speech
10
+ - speaker
11
+ - speaker-recognition
12
+ - speaker-verification
13
+ - speaker-identification
14
+ - speaker-embedding
15
+ datasets:
16
+ - voxceleb
17
+ license: cc-by-4.0
18
+ inference: false
19
+ ---
20
+
21
+ Using this open-source model in production?
22
+ Consider switching to [pyannoteAI](https://www.pyannote.ai) for better and faster options.
23
+
24
+ # 🎹 Wrapper around wespeaker-voxceleb-resnet34-LM
25
+
26
+ This model requires `pyannote.audio` version 3.1 or higher.
27
+
28
+ This is a wrapper around [WeSpeaker](https://github.com/wenet-e2e/wespeaker) `wespeaker-voxceleb-resnet34-LM` pretrained speaker embedding model, for use in `pyannote.audio`.
29
+
30
+ ## Basic usage
31
+
32
+ ```python
33
+ # instantiate pretrained model
34
+ from pyannote.audio import Model
35
+ model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")
36
+ ```
37
+
38
+ ```python
39
+ from pyannote.audio import Inference
40
+ inference = Inference(model, window="whole")
41
+ embedding1 = inference("speaker1.wav")
42
+ embedding2 = inference("speaker2.wav")
43
+ # `embeddingX` is (1 x D) numpy array extracted from the file as a whole.
44
+
45
+ from scipy.spatial.distance import cdist
46
+ distance = cdist(embedding1, embedding2, metric="cosine")[0,0]
47
+ # `distance` is a `float` describing how dissimilar speakers 1 and 2 are.
48
+ ```
49
+
50
+ ## Advanced usage
51
+
52
+ ### Running on GPU
53
+
54
+ ```python
55
+ import torch
56
+ inference.to(torch.device("cuda"))
57
+ embedding = inference("audio.wav")
58
+ ```
59
+
60
+ ### Extract embedding from an excerpt
61
+
62
+ ```python
63
+ from pyannote.audio import Inference
64
+ from pyannote.core import Segment
65
+ inference = Inference(model, window="whole")
66
+ excerpt = Segment(13.37, 19.81)
67
+ embedding = inference.crop("audio.wav", excerpt)
68
+ # `embedding` is (1 x D) numpy array extracted from the file excerpt.
69
+ ```
70
+
71
+ ### Extract embeddings using a sliding window
72
+
73
+ ```python
74
+ from pyannote.audio import Inference
75
+ inference = Inference(model, window="sliding",
76
+ duration=3.0, step=1.0)
77
+ embeddings = inference("audio.wav")
78
+ # `embeddings` is a (N x D) pyannote.core.SlidingWindowFeature
79
+ # `embeddings[i]` is the embedding of the ith position of the
80
+ # sliding window, i.e. from [i * step, i * step + duration].
81
+ ```
82
+
83
+ ## License
84
+
85
+ According to [this page](https://github.com/wenet-e2e/wespeaker/blob/master/docs/pretrained.md):
86
+
87
+ > The pretrained model in WeNet follows the license of it's corresponding dataset. For example, the pretrained model on VoxCeleb follows Creative Commons Attribution 4.0 International License., since it is used as license of the VoxCeleb dataset, see https://mm.kaist.ac.kr/datasets/voxceleb/.
88
+
89
+ ## Citation
90
+
91
+ ```bibtex
92
+ @inproceedings{Wang2023,
93
+ title={Wespeaker: A research and production oriented speaker embedding learning toolkit},
94
+ author={Wang, Hongji and Liang, Chengdong and Wang, Shuai and Chen, Zhengyang and Zhang, Binbin and Xiang, Xu and Deng, Yanlei and Qian, Yanmin},
95
+ booktitle={ICASSP 2023, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
96
+ pages={1--5},
97
+ year={2023},
98
+ organization={IEEE}
99
+ }
100
+ ```
101
+
102
+ ```bibtex
103
+ @inproceedings{Bredin23,
104
+ author={Hervé Bredin},
105
+ title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
106
+ year=2023,
107
+ booktitle={Proc. INTERSPEECH 2023},
108
+ pages={1983--1987},
109
+ doi={10.21437/Interspeech.2023-105}
110
+ }
111
+ ```
config.yaml ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ _target_: pyannote.audio.models.embedding.WeSpeakerResNet34
3
+ sample_rate: 16000
4
+ num_channels: 1
5
+ num_mel_bins: 80
6
+ frame_length: 25
7
+ frame_shift: 10
8
+ dither: 0.0
9
+ window_type: hamming
10
+ use_energy: false
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366edf44f4c80889a3eb7a9d7bdf02c4aede3127f7dd15e274dcdb826b143c56
3
+ size 26645418