speechbrain
/

urbansound8k_ecapa

@@ -3,16 +3,16 @@ language: "en"
 thumbnail:
 tags:
 - embeddings
-- Commands
 - Keywords
 - Keyword Spotting
 - pytorch
-- xvectors
 - TDNN
 - Command Recognition
 license: "apache-2.0"
 datasets:
-- google speech commands
 metrics:
 - Accuracy
@@ -21,25 +21,25 @@ metrics:
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
-# Command Recognition with xvector embeddings on Google Speech Commands
-This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
-You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
-The dataset provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
 ```
-'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence'
 ```
 For a better experience, we encourage you to learn more about
 [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
-| Release | Accuracy(%)
 |:-------------:|:--------------:|
-| 06-02-21 | 98.14 |
 ## Pipeline description
-This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
 ## Install SpeechBrain
@@ -52,15 +52,13 @@ pip install speechbrain
 Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
-### Perform Command Recognition
 ```python
 import torchaudio
 from speechbrain.pretrained import EncoderClassifier
-classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
-out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
-print(text_lab)
-out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
 print(text_lab)
 ```
@@ -68,7 +66,7 @@ print(text_lab)
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ### Training
-The model was trained with SpeechBrain (b7ff9dc4).
 To train it from scratch follows these steps:
 1. Clone SpeechBrain:
 ```bash
@@ -83,43 +81,40 @@ pip install -e .
 3. Run Training:
 ```
-cd recipes/Google-speech-commands
-python train.py hparams/xvect.yaml --data_folder=your_data_folder
 ```
-You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
-#### Referencing xvectors
-```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18,
-  author    = {David Snyder and
-               Daniel Garcia{-}Romero and
-               Alan McCree and
-               Gregory Sell and
-               Daniel Povey and
-               Sanjeev Khudanpur},
-  title     = {Spoken Language Recognition using X-vectors},
-  booktitle = {Odyssey 2018},
-  pages     = {105--111},
-  year      = {2018},
 }
 ```
-#### Referencing Google Speech Commands
-```@article{speechcommands,
-   author = { {Warden}, P.},
-    title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
-  journal = {ArXiv e-prints},
-  archivePrefix = "arXiv",
-  eprint = {1804.03209},
-  primaryClass = "cs.CL",
-  keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
-    year = 2018,
-    month = apr,
-    url = {https://arxiv.org/abs/1804.03209},
-}
 ```
@@ -132,7 +127,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
     year = {2021},
     publisher = {GitHub},
     journal = {GitHub repository},
-    howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
   }
 ```

 thumbnail:
 tags:
 - embeddings
+- Sound
 - Keywords
 - Keyword Spotting
 - pytorch
+- ECAPA-TDNN
 - TDNN
 - Command Recognition
 license: "apache-2.0"
 datasets:
+- Urbansound8k
 metrics:
 - Accuracy
 <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
 <br/><br/>
+# Command Recognition with ECAPA embeddings on UrbanSoudnd8k
+This repository provides all the necessary tools to perform sound recognition with SpeechBrain using a model pretrained on UrbanSound8k.
+You can download the dataset [here](https://urbansounddataset.weebly.com/urbansound8k.html)
+The provided system can recognize the following 10 keywords:
 ```
+dog_bark, children_playing, air_conditioner, street_music, gun_shot, siren, engine_idling, jackhammer, drilling, car_horn
 ```
 For a better experience, we encourage you to learn more about
 [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
+| Release | Accuracy 1-fold (%)
 |:-------------:|:--------------:|
+| 04-06-21 | 75.5 |
 ## Pipeline description
+This system is composed of a ECAPA model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
 ## Install SpeechBrain
 Please notice that we encourage you to read our tutorials and learn more about
 [SpeechBrain](https://speechbrain.github.io).
+### Perform Sound Recognition
 ```python
 import torchaudio
 from speechbrain.pretrained import EncoderClassifier
+classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa")
+out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
 print(text_lab)
 ```
 To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
 ### Training
+The model was trained with SpeechBrain (8cab8b0c).
 To train it from scratch follows these steps:
 1. Clone SpeechBrain:
 ```bash
 3. Run Training:
 ```
+cd recipes/UrbanSound8k/SoundClassification
+python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
 ```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1sItfg_WNuGX6h2dCs8JTGq2v2QoNTaUg?usp=sharing).
 ### Limitations
 The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+#### Referencing ECAPA
+```@inproceedings{DBLP:conf/interspeech/DesplanquesTD20,
+  author    = {Brecht Desplanques and
+               Jenthe Thienpondt and
+               Kris Demuynck},
+  editor    = {Helen Meng and
+               Bo Xu and
+               Thomas Fang Zheng},
+  title     = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
+               in {TDNN} Based Speaker Verification},
+  booktitle = {Interspeech 2020},
+  pages     = {3830--3834},
+  publisher = {{ISCA}},
+  year      = {2020},
 }
 ```
+#### Referencing UrbanSound
+```@inproceedings{Salamon:UrbanSound:ACMMM:14,
+    Author = {Salamon, J. and Jacoby, C. and Bello, J. P.},
+    Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)},
+    Month = {Nov.},
+    Pages = {1041--1044},
+    Title = {A Dataset and Taxonomy for Urban Sound Research},
+    Year = {2014}}
 ```
     year = {2021},
     publisher = {GitHub},
     journal = {GitHub repository},
+    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
   }
 ```