speechbrain
/

lang-id-voxlingua107-ecapa

Audio Classification

Model card Files Files and versions Community

Mirco commited on Nov 30, 2021

Commit

7e9e1bd

•

1 Parent(s): 0612491

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -30,6 +30,9 @@ The model uses the ECAPA-TDNN architecture that has previously been used for spe
 more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
 We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
 The model can classify a speech utterance according to the language spoken.
 It covers 107 different languages (
 Abkhazian,
@@ -199,6 +202,8 @@ print(emb.shape)
 ```
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 #### Limitations and bias

 more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
 We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
+The system is trained with recordings sampled at 16kHz (single channel).
+The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed.
 The model can classify a speech utterance according to the language spoken.
 It covers 107 different languages (
 Abkhazian,
 ```
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+The system is trained with recordings sampled at 16kHz (single channel).
+The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
 #### Limitations and bias