urbansound8k_ecapa / README.md
speechbrainteam's picture
Update README.md
2d8381e
|
raw
history blame
4.65 kB
---
language: "en"
thumbnail:
tags:
- embeddings
- Sound
- Keywords
- Keyword Spotting
- pytorch
- ECAPA-TDNN
- TDNN
- Command Recognition
license: "apache-2.0"
datasets:
- Urbansound8k
metrics:
- Accuracy
---
<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
<br/><br/>
# Command Recognition with ECAPA embeddings on UrbanSoudnd8k
This repository provides all the necessary tools to perform sound recognition with SpeechBrain using a model pretrained on UrbanSound8k.
You can download the dataset [here](https://urbansounddataset.weebly.com/urbansound8k.html)
The provided system can recognize the following 10 keywords:
```
dog_bark, children_playing, air_conditioner, street_music, gun_shot, siren, engine_idling, jackhammer, drilling, car_horn
```
For a better experience, we encourage you to learn more about
[SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
| Release | Accuracy 1-fold (%)
|:-------------:|:--------------:|
| 04-06-21 | 75.5 |
## Pipeline description
This system is composed of a ECAPA model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
## Install SpeechBrain
First of all, please install SpeechBrain with the following command:
```
pip install speechbrain
```
Please notice that we encourage you to read our tutorials and learn more about
[SpeechBrain](https://speechbrain.github.io).
### Perform Sound Recognition
```python
import torchaudio
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa")
out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
print(text_lab)
```
### Inference on GPU
To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
### Training
The model was trained with SpeechBrain (8cab8b0c).
To train it from scratch follows these steps:
1. Clone SpeechBrain:
```bash
git clone https://github.com/speechbrain/speechbrain/
```
2. Install it:
```
cd speechbrain
pip install -r requirements.txt
pip install -e .
```
3. Run Training:
```
cd recipes/UrbanSound8k/SoundClassification
python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
```
You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1sItfg_WNuGX6h2dCs8JTGq2v2QoNTaUg?usp=sharing).
### Limitations
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
#### Referencing ECAPA
```@inproceedings{DBLP:conf/interspeech/DesplanquesTD20,
author = {Brecht Desplanques and
Jenthe Thienpondt and
Kris Demuynck},
editor = {Helen Meng and
Bo Xu and
Thomas Fang Zheng},
title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
in {TDNN} Based Speaker Verification},
booktitle = {Interspeech 2020},
pages = {3830--3834},
publisher = {{ISCA}},
year = {2020},
}
```
#### Referencing UrbanSound
```@inproceedings{Salamon:UrbanSound:ACMMM:14,
Author = {Salamon, J. and Jacoby, C. and Bello, J. P.},
Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)},
Month = {Nov.},
Pages = {1041--1044},
Title = {A Dataset and Taxonomy for Urban Sound Research},
Year = {2014}}
```
#### Referencing SpeechBrain
```
@misc{SB2021,
author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
title = {SpeechBrain},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
}
```
#### About SpeechBrain
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
Website: https://speechbrain.github.io/
GitHub: https://github.com/speechbrain/speechbrain