smajumdar94
commited on
Commit
•
3c99844
1
Parent(s):
5f243ee
Update README.md
Browse files
README.md
CHANGED
@@ -114,7 +114,7 @@ img {
|
|
114 |
| [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
|
115 |
|
116 |
|
117 |
-
This model extracts speaker embeddings from given speech, which
|
118 |
It is a "large" version of TitaNet (around 23M parameters) models.
|
119 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
|
120 |
|
@@ -141,7 +141,7 @@ speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained("nvidia/
|
|
141 |
Using
|
142 |
|
143 |
```python
|
144 |
-
emb = speaker_model.get_embedding("
|
145 |
```
|
146 |
|
147 |
### Verifying two utterances (Speaker Verification)
|
@@ -149,7 +149,7 @@ emb = speaker_model.get_embedding("nvidia/an255-fash-b.wav")
|
|
149 |
Now to check if two audio files are from the same speaker or not, simply do:
|
150 |
|
151 |
```python
|
152 |
-
speaker_model.verify_speakers("
|
153 |
```
|
154 |
|
155 |
### Extracting Embeddings for more audio files
|
@@ -161,6 +161,7 @@ Write audio files to a `manifest.json` file with lines as in format:
|
|
161 |
```json
|
162 |
{"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
|
163 |
```
|
|
|
164 |
Then running following script will extract embeddings and writes to current working directory:
|
165 |
```shell
|
166 |
python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json
|
|
|
114 |
| [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
|
115 |
|
116 |
|
117 |
+
This model extracts speaker embeddings from given speech, which is the backbone for speaker verification and diarization tasks.
|
118 |
It is a "large" version of TitaNet (around 23M parameters) models.
|
119 |
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
|
120 |
|
|
|
141 |
Using
|
142 |
|
143 |
```python
|
144 |
+
emb = speaker_model.get_embedding("an255-fash-b.wav")
|
145 |
```
|
146 |
|
147 |
### Verifying two utterances (Speaker Verification)
|
|
|
149 |
Now to check if two audio files are from the same speaker or not, simply do:
|
150 |
|
151 |
```python
|
152 |
+
speaker_model.verify_speakers("an255-fash-b.wav","cen7-fash-b.wav")
|
153 |
```
|
154 |
|
155 |
### Extracting Embeddings for more audio files
|
|
|
161 |
```json
|
162 |
{"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
|
163 |
```
|
164 |
+
|
165 |
Then running following script will extract embeddings and writes to current working directory:
|
166 |
```shell
|
167 |
python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json
|