--- language: "en" thumbnail: tags: - embeddings - Sound - Keywords - Keyword Spotting - pytorch - ECAPA-TDNN - TDNN - Command Recognition license: "apache-2.0" datasets: - Urbansound8k metrics: - Accuracy ---

# Command Recognition with ECAPA embeddings on UrbanSoudnd8k This repository provides all the necessary tools to perform sound recognition with SpeechBrain using a model pretrained on UrbanSound8k. You can download the dataset [here](https://urbansounddataset.weebly.com/urbansound8k.html) The provided system can recognize the following 10 keywords: ``` dog_bark, children_playing, air_conditioner, street_music, gun_shot, siren, engine_idling, jackhammer, drilling, car_horn ``` For a better experience, we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is: | Release | Accuracy 1-fold (%) |:-------------:|:--------------:| | 04-06-21 | 75.5 | ## Pipeline description This system is composed of a ECAPA model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that. ## Install SpeechBrain First of all, please install SpeechBrain with the following command: ``` pip install speechbrain ``` Please notice that we encourage you to read our tutorials and learn more about [SpeechBrain](https://speechbrain.github.io). ### Perform Sound Recognition ```python import torchaudio from speechbrain.pretrained import EncoderClassifier classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa") out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav') print(text_lab) ``` ### Inference on GPU To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method. ### Training The model was trained with SpeechBrain (8cab8b0c). To train it from scratch follows these steps: 1. Clone SpeechBrain: ```bash git clone https://github.com/speechbrain/speechbrain/ ``` 2. Install it: ``` cd speechbrain pip install -r requirements.txt pip install -e . ``` 3. Run Training: ``` cd recipes/UrbanSound8k/SoundClassification python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder ``` You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1sItfg_WNuGX6h2dCs8JTGq2v2QoNTaUg?usp=sharing). ### Limitations The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets. #### Referencing ECAPA ```@inproceedings{DBLP:conf/interspeech/DesplanquesTD20, author = {Brecht Desplanques and Jenthe Thienpondt and Kris Demuynck}, editor = {Helen Meng and Bo Xu and Thomas Fang Zheng}, title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation in {TDNN} Based Speaker Verification}, booktitle = {Interspeech 2020}, pages = {3830--3834}, publisher = {{ISCA}}, year = {2020}, } ``` #### Referencing UrbanSound ```@inproceedings{Salamon:UrbanSound:ACMMM:14, Author = {Salamon, J. and Jacoby, C. and Bello, J. P.}, Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)}, Month = {Nov.}, Pages = {1041--1044}, Title = {A Dataset and Taxonomy for Urban Sound Research}, Year = {2014}} ``` #### Referencing SpeechBrain ``` @misc{SB2021, author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua }, title = {SpeechBrain}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}}, } ``` #### About SpeechBrain SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. Website: https://speechbrain.github.io/ GitHub: https://github.com/speechbrain/speechbrain