speechbrainteam commited on
Commit
2d8381e
1 Parent(s): d5b7184

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -46
README.md CHANGED
@@ -3,16 +3,16 @@ language: "en"
3
  thumbnail:
4
  tags:
5
  - embeddings
6
- - Commands
7
  - Keywords
8
  - Keyword Spotting
9
  - pytorch
10
- - xvectors
11
  - TDNN
12
  - Command Recognition
13
  license: "apache-2.0"
14
  datasets:
15
- - google speech commands
16
  metrics:
17
  - Accuracy
18
 
@@ -21,25 +21,25 @@ metrics:
21
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
22
  <br/><br/>
23
 
24
- # Command Recognition with xvector embeddings on Google Speech Commands
25
 
26
- This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
27
- You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
28
- The dataset provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
29
  ```
30
- 'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence'
31
  ```
32
 
33
  For a better experience, we encourage you to learn more about
34
  [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
35
 
36
- | Release | Accuracy(%)
37
  |:-------------:|:--------------:|
38
- | 06-02-21 | 98.14 |
39
 
40
 
41
  ## Pipeline description
42
- This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
43
 
44
  ## Install SpeechBrain
45
 
@@ -52,15 +52,13 @@ pip install speechbrain
52
  Please notice that we encourage you to read our tutorials and learn more about
53
  [SpeechBrain](https://speechbrain.github.io).
54
 
55
- ### Perform Command Recognition
56
 
57
  ```python
58
  import torchaudio
59
  from speechbrain.pretrained import EncoderClassifier
60
- classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
61
- out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
62
- print(text_lab)
63
- out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
64
  print(text_lab)
65
  ```
66
 
@@ -68,7 +66,7 @@ print(text_lab)
68
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
69
 
70
  ### Training
71
- The model was trained with SpeechBrain (b7ff9dc4).
72
  To train it from scratch follows these steps:
73
  1. Clone SpeechBrain:
74
  ```bash
@@ -83,43 +81,40 @@ pip install -e .
83
 
84
  3. Run Training:
85
  ```
86
- cd recipes/Google-speech-commands
87
- python train.py hparams/xvect.yaml --data_folder=your_data_folder
88
  ```
89
 
90
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
91
 
92
  ### Limitations
93
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
94
 
95
- #### Referencing xvectors
96
- ```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18,
97
- author = {David Snyder and
98
- Daniel Garcia{-}Romero and
99
- Alan McCree and
100
- Gregory Sell and
101
- Daniel Povey and
102
- Sanjeev Khudanpur},
103
- title = {Spoken Language Recognition using X-vectors},
104
- booktitle = {Odyssey 2018},
105
- pages = {105--111},
106
- year = {2018},
 
 
107
  }
108
  ```
109
 
110
- #### Referencing Google Speech Commands
111
- ```@article{speechcommands,
112
- author = { {Warden}, P.},
113
- title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
114
- journal = {ArXiv e-prints},
115
- archivePrefix = "arXiv",
116
- eprint = {1804.03209},
117
- primaryClass = "cs.CL",
118
- keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
119
- year = 2018,
120
- month = apr,
121
- url = {https://arxiv.org/abs/1804.03209},
122
- }
123
  ```
124
 
125
 
@@ -132,7 +127,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
132
  year = {2021},
133
  publisher = {GitHub},
134
  journal = {GitHub repository},
135
- howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
136
  }
137
  ```
138
 
 
3
  thumbnail:
4
  tags:
5
  - embeddings
6
+ - Sound
7
  - Keywords
8
  - Keyword Spotting
9
  - pytorch
10
+ - ECAPA-TDNN
11
  - TDNN
12
  - Command Recognition
13
  license: "apache-2.0"
14
  datasets:
15
+ - Urbansound8k
16
  metrics:
17
  - Accuracy
18
 
 
21
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
22
  <br/><br/>
23
 
24
+ # Command Recognition with ECAPA embeddings on UrbanSoudnd8k
25
 
26
+ This repository provides all the necessary tools to perform sound recognition with SpeechBrain using a model pretrained on UrbanSound8k.
27
+ You can download the dataset [here](https://urbansounddataset.weebly.com/urbansound8k.html)
28
+ The provided system can recognize the following 10 keywords:
29
  ```
30
+ dog_bark, children_playing, air_conditioner, street_music, gun_shot, siren, engine_idling, jackhammer, drilling, car_horn
31
  ```
32
 
33
  For a better experience, we encourage you to learn more about
34
  [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
35
 
36
+ | Release | Accuracy 1-fold (%)
37
  |:-------------:|:--------------:|
38
+ | 04-06-21 | 75.5 |
39
 
40
 
41
  ## Pipeline description
42
+ This system is composed of a ECAPA model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
43
 
44
  ## Install SpeechBrain
45
 
 
52
  Please notice that we encourage you to read our tutorials and learn more about
53
  [SpeechBrain](https://speechbrain.github.io).
54
 
55
+ ### Perform Sound Recognition
56
 
57
  ```python
58
  import torchaudio
59
  from speechbrain.pretrained import EncoderClassifier
60
+ classifier = EncoderClassifier.from_hparams(source="speechbrain/urbansound8k_ecapa", savedir="pretrained_models/gurbansound8k_ecapa")
61
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/urbansound8k_ecapa/dog_bark.wav')
 
 
62
  print(text_lab)
63
  ```
64
 
 
66
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
67
 
68
  ### Training
69
+ The model was trained with SpeechBrain (8cab8b0c).
70
  To train it from scratch follows these steps:
71
  1. Clone SpeechBrain:
72
  ```bash
 
81
 
82
  3. Run Training:
83
  ```
84
+ cd recipes/UrbanSound8k/SoundClassification
85
+ python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
86
  ```
87
 
88
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1sItfg_WNuGX6h2dCs8JTGq2v2QoNTaUg?usp=sharing).
89
 
90
  ### Limitations
91
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
92
 
93
+ #### Referencing ECAPA
94
+ ```@inproceedings{DBLP:conf/interspeech/DesplanquesTD20,
95
+ author = {Brecht Desplanques and
96
+ Jenthe Thienpondt and
97
+ Kris Demuynck},
98
+ editor = {Helen Meng and
99
+ Bo Xu and
100
+ Thomas Fang Zheng},
101
+ title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
102
+ in {TDNN} Based Speaker Verification},
103
+ booktitle = {Interspeech 2020},
104
+ pages = {3830--3834},
105
+ publisher = {{ISCA}},
106
+ year = {2020},
107
  }
108
  ```
109
 
110
+ #### Referencing UrbanSound
111
+ ```@inproceedings{Salamon:UrbanSound:ACMMM:14,
112
+ Author = {Salamon, J. and Jacoby, C. and Bello, J. P.},
113
+ Booktitle = {22nd {ACM} International Conference on Multimedia (ACM-MM'14)},
114
+ Month = {Nov.},
115
+ Pages = {1041--1044},
116
+ Title = {A Dataset and Taxonomy for Urban Sound Research},
117
+ Year = {2014}}
 
 
 
 
 
118
  ```
119
 
120
 
 
127
  year = {2021},
128
  publisher = {GitHub},
129
  journal = {GitHub repository},
130
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
131
  }
132
  ```
133