smajumdar94 commited on
Commit
3c99844
1 Parent(s): 5f243ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -114,7 +114,7 @@ img {
114
  | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
115
 
116
 
117
- This model extracts speaker embeddings from given speech, which are backbone for speaker verification and diarization tasks.
118
  It is a "large" version of TitaNet (around 23M parameters) models.
119
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
120
 
@@ -141,7 +141,7 @@ speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained("nvidia/
141
  Using
142
 
143
  ```python
144
- emb = speaker_model.get_embedding("nvidia/an255-fash-b.wav")
145
  ```
146
 
147
  ### Verifying two utterances (Speaker Verification)
@@ -149,7 +149,7 @@ emb = speaker_model.get_embedding("nvidia/an255-fash-b.wav")
149
  Now to check if two audio files are from the same speaker or not, simply do:
150
 
151
  ```python
152
- speaker_model.verify_speakers("nvidia/an255-fash-b.wav","nvidia/cen7-fash-b.wav")
153
  ```
154
 
155
  ### Extracting Embeddings for more audio files
@@ -161,6 +161,7 @@ Write audio files to a `manifest.json` file with lines as in format:
161
  ```json
162
  {"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
163
  ```
 
164
  Then running following script will extract embeddings and writes to current working directory:
165
  ```shell
166
  python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json
114
  | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
115
 
116
 
117
+ This model extracts speaker embeddings from given speech, which is the backbone for speaker verification and diarization tasks.
118
  It is a "large" version of TitaNet (around 23M parameters) models.
119
  See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/models.html#titanet) for complete architecture details.
120
 
141
  Using
142
 
143
  ```python
144
+ emb = speaker_model.get_embedding("an255-fash-b.wav")
145
  ```
146
 
147
  ### Verifying two utterances (Speaker Verification)
149
  Now to check if two audio files are from the same speaker or not, simply do:
150
 
151
  ```python
152
+ speaker_model.verify_speakers("an255-fash-b.wav","cen7-fash-b.wav")
153
  ```
154
 
155
  ### Extracting Embeddings for more audio files
161
  ```json
162
  {"audio_filepath": "<absolute path to dataset>/audio_file.wav", "duration": "duration of file in sec", "label": "speaker_id"}
163
  ```
164
+
165
  Then running following script will extract embeddings and writes to current working directory:
166
  ```shell
167
  python <NeMo_root>/examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json