--- language: "en" thumbnail: tags: - embeddings - Commands - Keywords - Keyword Spotting - pytorch - xvectors - TDNN - Command Recognition license: "apache-2.0" datasets: - google speech commands metrics: - Accuracy ---

# Command Recognition with xvector embeddings on Google Speech Commands This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands. You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands) The dataset provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords: ``` 'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence' ``` For a better experience, we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is: | Release | Accuracy(%) |:-------------:|:--------------:| | 06-02-21 | 98.14 | ## Pipeline description This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that. ## Install SpeechBrain First of all, please install SpeechBrain with the following command: ``` pip install speechbrain ``` Please notice that we encourage you to read our tutorials and learn more about [SpeechBrain](https://speechbrain.github.io). ### Perform Command Recognition ```python import torchaudio from speechbrain.pretrained import EncoderClassifier classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector") out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav') print(text_lab) out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav') print(text_lab) ``` ### Inference on GPU To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method. ### Training The model was trained with SpeechBrain (b7ff9dc4). To train it from scratch follows these steps: 1. Clone SpeechBrain: ```bash git clone https://github.com/speechbrain/speechbrain/ ``` 2. Install it: ``` cd speechbrain pip install -r requirements.txt pip install -e . ``` 3. Run Training: ``` cd recipes/Google-speech-commands python train.py hparams/xvect.yaml --data_folder=your_data_folder ``` You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing). ### Limitations The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets. #### Referencing xvectors ```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18, author = {David Snyder and Daniel Garcia{-}Romero and Alan McCree and Gregory Sell and Daniel Povey and Sanjeev Khudanpur}, title = {Spoken Language Recognition using X-vectors}, booktitle = {Odyssey 2018}, pages = {105--111}, year = {2018}, } ``` #### Referencing Google Speech Commands ```@article{speechcommands, author = { {Warden}, P.}, title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}", journal = {ArXiv e-prints}, archivePrefix = "arXiv", eprint = {1804.03209}, primaryClass = "cs.CL", keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction}, year = 2018, month = apr, url = {https://arxiv.org/abs/1804.03209}, } ``` #### Referencing SpeechBrain ``` @misc{SB2021, author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua }, title = {SpeechBrain}, year = {2021}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\\url{https://github.com/speechbrain/speechbrain}}, } ``` #### About SpeechBrain SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains. Website: https://speechbrain.github.io/ GitHub: https://github.com/speechbrain/speechbrain