Jzuluaga commited on
Commit
b9cf7cb
1 Parent(s): a94f2da

Create final model!

Browse files
README.md CHANGED
@@ -1,3 +1,145 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ thumbnail:
5
+ tags:
6
+ - audio-classification
7
+ - speechbrain
8
+ - embeddings
9
+ - Accent
10
+ - Identification
11
+ - pytorch
12
+ - ECAPA-TDNN
13
+ - TDNN
14
+ - CommonAccent
15
+ license: "mit"
16
+ datasets:
17
+ - CommonVoice
18
+ metrics:
19
+ - Accuracy
20
+ widget:
21
+ - example_title: Australian English
22
+ src: australia_1.wav
23
+ - example_title: African English
24
+ src: african_1.wav
25
+ - example_title: Canadian English
26
+ src: canada_1.wav
27
  ---
28
+
29
+
30
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
31
+ <br/><br/>
32
+
33
+ # Accent Identification from Speech Recordings with ECAPA embeddings on CommonAccent
34
+
35
+ This repository provides all the necessary tools to perform accent identification from speech recordings with SpeechBrain.
36
+ The system uses a model pretrained on the CommonAccent dataset in English (16 accents).
37
+ The provided system can recognize the following 16 languages from short speech recordings:
38
+
39
+ ```
40
+ african australia bermuda canada england hongkong indian ireland malaysia newzealand philippines scotland singapore southatlandtic us wales
41
+ ```
42
+
43
+ ### To UPDATE ALL BELOW
44
+
45
+ For a better experience, we encourage you to learn more about
46
+ [SpeechBrain](https://speechbrain.github.io). The given model performance on the test set is:
47
+
48
+ | Release | Accuracy (%)
49
+ |:-------------:|:--------------:|
50
+ | 30-06-21 | 85.0 |
51
+
52
+
53
+ ## Pipeline description
54
+ This system is composed of an ECAPA model coupled with statistical pooling. A classifier, trained with Categorical Cross-Entropy Loss, is applied on top of that.
55
+
56
+ The system is trained with recordings sampled at 16kHz (single channel).
57
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *classify_file* if needed. Make sure your input tensor is compliant with the expected sampling rate if you use *encode_batch* and *classify_batch*.
58
+
59
+ ## Install SpeechBrain
60
+
61
+ First of all, please install SpeechBrain with the following command:
62
+
63
+ ```
64
+ pip install speechbrain
65
+ ```
66
+
67
+ Please notice that we encourage you to read our tutorials and learn more about
68
+ [SpeechBrain](https://speechbrain.github.io).
69
+
70
+ ### Perform Language Identification from Speech Recordings
71
+
72
+ ```python
73
+ import torchaudio
74
+ from speechbrain.pretrained import EncoderClassifier
75
+ classifier = EncoderClassifier.from_hparams(source="speechbrain/lang-id-commonlanguage_ecapa", savedir="pretrained_models/lang-id-commonlanguage_ecapa")
76
+ # Italian Example
77
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/lang-id-commonlanguage_ecapa/example-it.wav')
78
+ print(text_lab)
79
+
80
+ # French Example
81
+ out_prob, score, index, text_lab = classifier.classify_file('speechbrain/lang-id-commonlanguage_ecapa/example-fr.wav')
82
+ print(text_lab)
83
+ ```
84
+
85
+ ### Inference on GPU
86
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
87
+
88
+ ### Training
89
+ The model was trained with SpeechBrain (a02f860e).
90
+ To train it from scratch follow these steps:
91
+ 1. Clone SpeechBrain:
92
+ ```bash
93
+ git clone https://github.com/speechbrain/speechbrain/
94
+ ```
95
+ 2. Install it:
96
+ ```
97
+ cd speechbrain
98
+ pip install -r requirements.txt
99
+ pip install -e .
100
+ ```
101
+
102
+ 3. Run Training:
103
+ ```
104
+ cd recipes/CommonLanguage/lang_id
105
+ python train.py hparams/train_ecapa_tdnn.yaml --data_folder=your_data_folder
106
+ ```
107
+
108
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1sD2u0MhSmJlx_3RRgwsYzevX81RM8-WE?usp=sharing).
109
+
110
+ ### Limitations
111
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
112
+
113
+ #### Referencing ECAPA
114
+ ```@inproceedings{DBLP:conf/interspeech/DesplanquesTD20,
115
+ author = {Brecht Desplanques and
116
+ Jenthe Thienpondt and
117
+ Kris Demuynck},
118
+ editor = {Helen Meng and
119
+ Bo Xu and
120
+ Thomas Fang Zheng},
121
+ title = {{ECAPA-TDNN:} Emphasized Channel Attention, Propagation and Aggregation
122
+ in {TDNN} Based Speaker Verification},
123
+ booktitle = {Interspeech 2020},
124
+ pages = {3830--3834},
125
+ publisher = {{ISCA}},
126
+ year = {2020},
127
+ }
128
+ ```
129
+
130
+
131
+ # **Citing SpeechBrain**
132
+ Please, cite SpeechBrain if you use it for your research or business.
133
+
134
+
135
+ ```bibtex
136
+ @misc{speechbrain,
137
+ title={{SpeechBrain}: A General-Purpose Speech Toolkit},
138
+ author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
139
+ year={2021},
140
+ eprint={2106.04624},
141
+ archivePrefix={arXiv},
142
+ primaryClass={eess.AS},
143
+ note={arXiv:2106.04624}
144
+ }
145
+ ```
accent_encoder.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'england' => 0
2
+ 'us' => 1
3
+ 'canada' => 2
4
+ 'australia' => 3
5
+ 'indian' => 4
6
+ 'scotland' => 5
7
+ 'ireland' => 6
8
+ 'african' => 7
9
+ 'malaysia' => 8
10
+ 'newzealand' => 9
11
+ 'southatlandtic' => 10
12
+ 'bermuda' => 11
13
+ 'philippines' => 12
14
+ 'hongkong' => 13
15
+ 'wales' => 14
16
+ 'singapore' => 15
17
+ ================
18
+ 'starting_index' => 0
classifier.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:146a2c6cb236e387b24972797ed9aebb3b54b09b33a072bee87eb3576bd88c01
3
+ size 13172
config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ {
2
+ brain_interface": "EncoderClassifier"
3
+ }
embedding_model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ffa5ac9c0ec21fd6677fa8a39b3b045f182236f5c312a8edca385dfa79e7e3c
3
+ size 83313275
hyperparams.yaml ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ############################################################################
2
+ # Model: ECAPA-TDNN for Accent Identification
3
+ # ############################################################################
4
+
5
+ # Pretrain folder (HuggingFace)
6
+ pretrained_path: Jzuluaga/accent-id-commonaccent_ecapa
7
+
8
+ # Feature parameters
9
+ n_mels: 80
10
+
11
+ # Output parameters
12
+ n_languages: 16 # Possible languages in the dataset
13
+ emb_dim: 192 # dimensionality of the embeddings
14
+
15
+ # Model params
16
+ compute_features: !new:speechbrain.lobes.features.Fbank
17
+ n_mels: !ref <n_mels>
18
+
19
+ mean_var_norm: !new:speechbrain.processing.features.InputNormalization
20
+ norm_type: sentence
21
+ std_norm: False
22
+
23
+ # Embedding Model
24
+ embedding_model: !new:speechbrain.lobes.models.ECAPA_TDNN.ECAPA_TDNN
25
+ input_size: !ref <n_mels>
26
+ activation: !name:torch.nn.LeakyReLU
27
+ channels: [1024, 1024, 1024, 1024, 3072]
28
+ kernel_sizes: [5, 3, 3, 3, 1]
29
+ dilations: [1, 2, 3, 4, 1]
30
+ attention_channels: 128
31
+ lin_neurons: !ref <emb_dim>
32
+
33
+ # Classifier based on cosine distance
34
+ classifier: !new:speechbrain.lobes.models.ECAPA_TDNN.Classifier
35
+ input_size: !ref <emb_dim>
36
+ out_neurons: !ref <n_languages>
37
+
38
+ modules:
39
+ compute_features: !ref <compute_features>
40
+ mean_var_norm: !ref <mean_var_norm>
41
+ embedding_model: !ref <embedding_model>
42
+ classifier: !ref <classifier>
43
+
44
+ label_encoder: !new:speechbrain.dataio.encoder.CategoricalEncoder
45
+
46
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
47
+ loadables:
48
+ embedding_model: !ref <embedding_model>
49
+ classifier: !ref <classifier>
50
+ label_encoder: !ref <label_encoder>
51
+ paths:
52
+ embedding_model: !ref <pretrained_path>/embedding_model.ckpt
53
+ classifier: !ref <pretrained_path>/classifier.ckpt
54
+ label_encoder: !ref <pretrained_path>/accent_encoder.txt
55
+
normalizer_input.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f39c561f5e6e0030621263350a2c079cfcb675fb82b635402f57fdb7e1e21a5
3
+ size 1163