Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,9 @@ The model uses the ECAPA-TDNN architecture that has previously been used for spe
|
|
30 |
more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
|
31 |
We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
|
32 |
|
|
|
|
|
|
|
33 |
The model can classify a speech utterance according to the language spoken.
|
34 |
It covers 107 different languages (
|
35 |
Abkhazian,
|
@@ -199,6 +202,8 @@ print(emb.shape)
|
|
199 |
```
|
200 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
201 |
|
|
|
|
|
202 |
|
203 |
#### Limitations and bias
|
204 |
|
|
|
30 |
more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
|
31 |
We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
|
32 |
|
33 |
+
The system is trained with recordings sampled at 16kHz (single channel).
|
34 |
+
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed.
|
35 |
+
|
36 |
The model can classify a speech utterance according to the language spoken.
|
37 |
It covers 107 different languages (
|
38 |
Abkhazian,
|
|
|
202 |
```
|
203 |
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
|
204 |
|
205 |
+
The system is trained with recordings sampled at 16kHz (single channel).
|
206 |
+
The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
|
207 |
|
208 |
#### Limitations and bias
|
209 |
|