speechbrain
/

lang-id-voxlingua107-ecapa

Audio Classification

Model card Files Files and versions Community

Update README.md

#11

by TanelAlumae - opened 25 days ago

base: refs/heads/main

←

from: refs/pr/11

Discussion Files changed

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -133,7 +133,7 @@ widget:
 ## Model description
-This is a spoken language recognition model trained on the VoxLingua107 dataset using SpeechBrain.
 The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses
 more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
 We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
@@ -259,7 +259,7 @@ The model has two uses:
   - use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data
 The model is trained on automatically collected YouTube data. For more
-information about the dataset, see [here](http://bark.phon.ioc.ee/voxlingua107/).
 #### How to use
@@ -330,7 +330,7 @@ Since the model is trained on VoxLingua107, it has many limitations and biases,
 ## Training data
-The model is trained on [VoxLingua107](http://bark.phon.ioc.ee/voxlingua107/).
 VoxLingua107 is a speech dataset for training spoken language identification models.
 The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.

 ## Model description
+This is a spoken language recognition model trained on the [VoxLingua107 dataset](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/) using SpeechBrain.
 The model uses the ECAPA-TDNN architecture that has previously been used for speaker recognition. However, it uses
 more fully connected hidden layers after the embedding layer, and cross-entropy loss was used for training.
 We observed that this improved the performance of extracted utterance embeddings for downstream tasks.
   - use as an utterance-level feature (embedding) extractor, for creating a dedicated language ID model on your own data
 The model is trained on automatically collected YouTube data. For more
+information about the dataset, see [here](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
 #### How to use
 ## Training data
+The model is trained on [VoxLingua107](https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/).
 VoxLingua107 is a speech dataset for training spoken language identification models.
 The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives.