speechbrainteam commited on
Commit
ac7e31d
1 Parent(s): fe1c8d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -22
README.md CHANGED
@@ -3,38 +3,43 @@ language: "en"
3
  thumbnail:
4
  tags:
5
  - embeddings
6
- - Speaker
7
- - Verification
8
- - Identification
9
  - pytorch
10
  - xvectors
11
  - TDNN
 
12
  license: "apache-2.0"
13
  datasets:
14
- - voxceleb
15
  metrics:
16
- - EER
17
- - min_dct
18
  ---
19
 
20
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
21
  <br/><br/>
22
 
23
- # Speaker Verification with xvector embeddings on Voxceleb
24
 
25
- This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
26
- The system is trained on Voxceleb 1+ Voxceleb2 training data.
 
 
 
 
27
 
28
  For a better experience, we encourage you to learn more about
29
- [SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
30
 
31
- | Release | EER(%)
32
  |:-------------:|:--------------:|
33
- | 05-03-21 | 3.2 |
34
 
35
 
36
  ## Pipeline description
37
- This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
38
 
39
  ## Install SpeechBrain
40
 
@@ -47,21 +52,23 @@ pip install speechbrain
47
  Please notice that we encourage you to read our tutorials and learn more about
48
  [SpeechBrain](https://speechbrain.github.io).
49
 
50
- ### Compute your speaker embeddings
51
 
52
  ```python
53
  import torchaudio
54
  from speechbrain.pretrained import EncoderClassifier
55
- classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
56
- signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
57
- embeddings = classifier.encode_batch(signal)
 
 
58
  ```
59
 
60
  ### Inference on GPU
61
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
62
 
63
  ### Training
64
- The model was trained with SpeechBrain (aa018540).
65
  To train it from scratch follows these steps:
66
  1. Clone SpeechBrain:
67
  ```bash
@@ -76,11 +83,11 @@ pip install -e .
76
 
77
  3. Run Training:
78
  ```
79
- cd recipes/VoxCeleb/SpeakerRec/
80
- python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
81
  ```
82
 
83
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
84
 
85
  ### Limitations
86
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
@@ -100,6 +107,21 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
100
  }
101
  ```
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  #### Referencing SpeechBrain
105
 
@@ -110,7 +132,7 @@ The SpeechBrain team does not provide any warranty on the performance achieved b
110
  year = {2021},
111
  publisher = {GitHub},
112
  journal = {GitHub repository},
113
- howpublished = {\url{https://github.com/speechbrain/speechbrain}},
114
  }
115
  ```
116
 
@@ -120,3 +142,4 @@ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to b
120
  Website: https://speechbrain.github.io/
121
 
122
  GitHub: https://github.com/speechbrain/speechbrain
 
 
3
  thumbnail:
4
  tags:
5
  - embeddings
6
+ - Commands
7
+ - Keywords
8
+ - Keyword Spotting
9
  - pytorch
10
  - xvectors
11
  - TDNN
12
+ - Command Recognition
13
  license: "apache-2.0"
14
  datasets:
15
+ - google speech commands
16
  metrics:
17
+ - Accuracy
18
+
19
  ---
20
 
21
  <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
22
  <br/><br/>
23
 
24
+ # Command Recognition with xvector embeddings on Google Speech Commands
25
 
26
+ This repository provides all the necessary tools to perform command recognition with SpeechBrain using a model pretrained on Google Speech Commands.
27
+ You can download the dataset [here](https://www.tensorflow.org/datasets/catalog/speech_commands)
28
+ The dataset provides small training, validation, and test sets useful for detecting single keywords in short audio clips. The provided system can recognize the following 12 keywords:
29
+ ```
30
+ 'yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence'
31
+ ```
32
 
33
  For a better experience, we encourage you to learn more about
34
+ [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
35
 
36
+ | Release | Accuracy(%)
37
  |:-------------:|:--------------:|
38
+ | 06-02-21 | 98.14 |
39
 
40
 
41
  ## Pipeline description
42
+ This system is composed of a TDNN model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
43
 
44
  ## Install SpeechBrain
45
 
 
52
  Please notice that we encourage you to read our tutorials and learn more about
53
  [SpeechBrain](https://speechbrain.github.io).
54
 
55
+ ### Perform Command Recognition
56
 
57
  ```python
58
  import torchaudio
59
  from speechbrain.pretrained import EncoderClassifier
60
+ classifier = EncoderClassifier.from_hparams(source="speechbrain/google_speech_command_xvector", savedir="pretrained_models/google_speech_command_xvector")
61
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/yes.wav')
62
+ print(text_lab)
63
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/google_speech_command_xvector/stop.wav')
64
+ print(text_lab)
65
  ```
66
 
67
  ### Inference on GPU
68
  To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
69
 
70
  ### Training
71
+ The model was trained with SpeechBrain (b7ff9dc4).
72
  To train it from scratch follows these steps:
73
  1. Clone SpeechBrain:
74
  ```bash
 
83
 
84
  3. Run Training:
85
  ```
86
+ cd recipes/Google-speech-commands
87
+ python train.py hparams/xvect.yaml --data_folder=your_data_folder
88
  ```
89
 
90
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1BKwtr1mBRICRe56PcQk2sCFq63Lsvdpc?usp=sharing).
91
 
92
  ### Limitations
93
  The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
107
  }
108
  ```
109
 
110
+ #### Referencing Google Speech Commands
111
+ ```@article{speechcommands,
112
+ author = { {Warden}, P.},
113
+ title = "{Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition}",
114
+ journal = {ArXiv e-prints},
115
+ archivePrefix = "arXiv",
116
+ eprint = {1804.03209},
117
+ primaryClass = "cs.CL",
118
+ keywords = {Computer Science - Computation and Language, Computer Science - Human-Computer Interaction},
119
+ year = 2018,
120
+ month = apr,
121
+ url = {https://arxiv.org/abs/1804.03209},
122
+ }
123
+ ```
124
+
125
 
126
  #### Referencing SpeechBrain
127
 
 
132
  year = {2021},
133
  publisher = {GitHub},
134
  journal = {GitHub repository},
135
+ howpublished = {\\url{https://github.com/speechbrain/speechbrain}},
136
  }
137
  ```
138
 
 
142
  Website: https://speechbrain.github.io/
143
 
144
  GitHub: https://github.com/speechbrain/speechbrain
145
+