dzeinali commited on
Commit
4772c25
1 Parent(s): 4dc63d0

Update model

Browse files
README.md ADDED
@@ -0,0 +1,439 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: fa
7
+ datasets:
8
+ - commonvoice
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/farsi_commonvoice_blstm`
15
+
16
+ This model was trained by dzeinali using commonvoice recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 716eb8f92e19708acfd08ba3bd39d40890d3a84b
23
+ pip install -e .
24
+ cd egs2/commonvoice/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/farsi_commonvoice_blstm
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Mon May 2 11:48:56 EDT 2022`
32
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.6a1`
34
+ - pytorch version: `pytorch 1.8.1+cu102`
35
+ - Git hash: `716eb8f92e19708acfd08ba3bd39d40890d3a84b`
36
+ - Commit date: `Thu Apr 28 19:50:59 2022 -0400`
37
+
38
+ ## asr_train_asr_rnn_raw_fa_bpe150_sp
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_rnn_asr_model_valid.acc.ave/test_fa|9728|68904|0.0|0.0|100.0|0.0|100.0|100.0|
44
+ |decode_rnn_asr_model_valid.acc.best/test_fa|9728|68904|91.4|7.2|1.4|1.0|9.5|30.1|
45
+
46
+ ### CER
47
+
48
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
49
+ |---|---|---|---|---|---|---|---|---|
50
+ |decode_rnn_asr_model_valid.acc.ave/test_fa|9728|331506|0.0|0.0|100.0|0.0|100.0|100.0|
51
+ |decode_rnn_asr_model_valid.acc.best/test_fa|9728|331506|97.2|1.3|1.5|0.7|3.6|30.1|
52
+
53
+ ### TER
54
+
55
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
56
+ |---|---|---|---|---|---|---|---|---|
57
+ |decode_rnn_asr_model_valid.acc.ave/test_fa|9728|230963|0.0|0.0|100.0|0.0|100.0|100.0|
58
+ |decode_rnn_asr_model_valid.acc.best/test_fa|9728|230963|95.9|2.4|1.6|0.7|4.7|30.1|
59
+
60
+ ## ASR config
61
+
62
+ <details><summary>expand</summary>
63
+
64
+ ```
65
+ config: conf/tuning/train_asr_rnn.yaml
66
+ print_config: false
67
+ log_level: INFO
68
+ dry_run: false
69
+ iterator_type: sequence
70
+ output_dir: exp/asr_train_asr_rnn_raw_fa_bpe150_sp
71
+ ngpu: 1
72
+ seed: 0
73
+ num_workers: 1
74
+ num_att_plot: 3
75
+ dist_backend: nccl
76
+ dist_init_method: env://
77
+ dist_world_size: null
78
+ dist_rank: null
79
+ local_rank: 0
80
+ dist_master_addr: null
81
+ dist_master_port: null
82
+ dist_launcher: null
83
+ multiprocessing_distributed: false
84
+ unused_parameters: false
85
+ sharded_ddp: false
86
+ cudnn_enabled: true
87
+ cudnn_benchmark: false
88
+ cudnn_deterministic: true
89
+ collect_stats: false
90
+ write_collected_feats: false
91
+ max_epoch: 15
92
+ patience: 3
93
+ val_scheduler_criterion:
94
+ - valid
95
+ - loss
96
+ early_stopping_criterion:
97
+ - valid
98
+ - loss
99
+ - min
100
+ best_model_criterion:
101
+ - - train
102
+ - loss
103
+ - min
104
+ - - valid
105
+ - loss
106
+ - min
107
+ - - train
108
+ - acc
109
+ - max
110
+ - - valid
111
+ - acc
112
+ - max
113
+ keep_nbest_models:
114
+ - 10
115
+ nbest_averaging_interval: 0
116
+ grad_clip: 5.0
117
+ grad_clip_type: 2.0
118
+ grad_noise: false
119
+ accum_grad: 1
120
+ no_forward_run: false
121
+ resume: true
122
+ train_dtype: float32
123
+ use_amp: false
124
+ log_interval: null
125
+ use_matplotlib: true
126
+ use_tensorboard: true
127
+ use_wandb: false
128
+ wandb_project: null
129
+ wandb_id: null
130
+ wandb_entity: null
131
+ wandb_name: null
132
+ wandb_model_log_interval: -1
133
+ detect_anomaly: false
134
+ pretrain_path: null
135
+ init_param: []
136
+ ignore_init_mismatch: false
137
+ freeze_param: []
138
+ num_iters_per_epoch: null
139
+ batch_size: 30
140
+ valid_batch_size: null
141
+ batch_bins: 1000000
142
+ valid_batch_bins: null
143
+ train_shape_file:
144
+ - exp/asr_stats_raw_fa_bpe150_sp/train/speech_shape
145
+ - exp/asr_stats_raw_fa_bpe150_sp/train/text_shape.bpe
146
+ valid_shape_file:
147
+ - exp/asr_stats_raw_fa_bpe150_sp/valid/speech_shape
148
+ - exp/asr_stats_raw_fa_bpe150_sp/valid/text_shape.bpe
149
+ batch_type: folded
150
+ valid_batch_type: null
151
+ fold_length:
152
+ - 80000
153
+ - 150
154
+ sort_in_batch: descending
155
+ sort_batch: descending
156
+ multiple_iterator: false
157
+ chunk_length: 500
158
+ chunk_shift_ratio: 0.5
159
+ num_cache_chunks: 1024
160
+ train_data_path_and_name_and_type:
161
+ - - dump/raw/train_fa_sp/wav.scp
162
+ - speech
163
+ - sound
164
+ - - dump/raw/train_fa_sp/text
165
+ - text
166
+ - text
167
+ valid_data_path_and_name_and_type:
168
+ - - dump/raw/dev_fa/wav.scp
169
+ - speech
170
+ - sound
171
+ - - dump/raw/dev_fa/text
172
+ - text
173
+ - text
174
+ allow_variable_data_keys: false
175
+ max_cache_size: 0.0
176
+ max_cache_fd: 32
177
+ valid_max_cache_size: null
178
+ optim: adadelta
179
+ optim_conf:
180
+ lr: 0.1
181
+ scheduler: null
182
+ scheduler_conf: {}
183
+ token_list:
184
+ - <blank>
185
+ - <unk>
186
+ - ی
187
+ - ا
188
+ - ه
189
+ - ▁
190
+ - ر
191
+ - م
192
+ - و
193
+ - د
194
+ - ت
195
+ - ش
196
+ - ن
197
+ - ل
198
+ - ▁ب
199
+ - ز
200
+ - ب
201
+ - .
202
+ - ▁م
203
+ - ان
204
+ - ▁ا
205
+ - س
206
+ - ک
207
+ - ▁می
208
+ - گ
209
+ - ف
210
+ - ▁د
211
+ - ؟
212
+ - ق
213
+ - ▁و
214
+ - ید
215
+ - ▁ن
216
+ - ند
217
+ - ست
218
+ - ار
219
+ - ▁چ
220
+ - ع
221
+ - ج
222
+ - ▁ت
223
+ - ▁ک
224
+ - ▁با
225
+ - خ
226
+ - ون
227
+ - ▁پ
228
+ - ▁به
229
+ - ▁من
230
+ - ▁س
231
+ - ▁را
232
+ - ،
233
+ - ▁خ
234
+ - ▁این
235
+ - ▁کن
236
+ - ▁آ
237
+ - ▁در
238
+ - ای
239
+ - ▁از
240
+ - اد
241
+ - ▁است
242
+ - ح
243
+ - ص
244
+ - ▁ش
245
+ - ط
246
+ - ▁تو
247
+ - ین
248
+ - ▁دار
249
+ - ▁که
250
+ - ال
251
+ - ▁رو
252
+ - ▁گ
253
+ - ▁ج
254
+ - ور
255
+ - ام
256
+ - ▁هم
257
+ - ▁ح
258
+ - فت
259
+ - رد
260
+ - یم
261
+ - پ
262
+ - غ
263
+ - چ
264
+ - ذ
265
+ - ض
266
+ - ظ
267
+ - '!'
268
+ - ث
269
+ - ً
270
+ - ئ
271
+ - '"'
272
+ - ژ
273
+ - ك
274
+ - آ
275
+ - ي
276
+ - ':'
277
+ - ى
278
+ - '-'
279
+ - ِ
280
+ - أ
281
+ - َ
282
+ - »
283
+ - ـ
284
+ - ','
285
+ - ُ
286
+ - (
287
+ - )
288
+ - ء
289
+ - ٔ
290
+ - ٬
291
+ - ّ
292
+ - ؛
293
+ - B
294
+ - C
295
+ - A
296
+ - E
297
+ - G
298
+ - M
299
+ - S
300
+ - ؤ
301
+ - I
302
+ - ;
303
+ - T
304
+ - H
305
+ - _
306
+ - F
307
+ - D
308
+ - ۀ
309
+ - Y
310
+ - N
311
+ - K
312
+ - U
313
+ - –
314
+ - ٌ
315
+ - P
316
+ - O
317
+ - Q
318
+ - Z
319
+ - '&'
320
+ - L
321
+ - R
322
+ - ة
323
+ - X
324
+ - ā
325
+ - '#'
326
+ - “
327
+ - '='
328
+ - «
329
+ - š
330
+ - ْ
331
+ - ے
332
+ - ”
333
+ - <sos/eos>
334
+ init: null
335
+ input_size: null
336
+ ctc_conf:
337
+ dropout_rate: 0.0
338
+ ctc_type: builtin
339
+ reduce: true
340
+ ignore_nan_grad: true
341
+ joint_net_conf: null
342
+ model_conf:
343
+ ctc_weight: 0.5
344
+ use_preprocessor: true
345
+ token_type: bpe
346
+ bpemodel: data/fa_token_list/bpe_unigram150/bpe.model
347
+ non_linguistic_symbols: null
348
+ cleaner: null
349
+ g2p: null
350
+ speech_volume_normalize: null
351
+ rir_scp: null
352
+ rir_apply_prob: 1.0
353
+ noise_scp: null
354
+ noise_apply_prob: 1.0
355
+ noise_db_range: '13_15'
356
+ frontend: default
357
+ frontend_conf:
358
+ fs: 16k
359
+ specaug: specaug
360
+ specaug_conf:
361
+ apply_time_warp: true
362
+ time_warp_window: 5
363
+ time_warp_mode: bicubic
364
+ apply_freq_mask: true
365
+ freq_mask_width_range:
366
+ - 0
367
+ - 27
368
+ num_freq_mask: 2
369
+ apply_time_mask: true
370
+ time_mask_width_ratio_range:
371
+ - 0.0
372
+ - 0.05
373
+ num_time_mask: 2
374
+ normalize: global_mvn
375
+ normalize_conf:
376
+ stats_file: exp/asr_stats_raw_fa_bpe150_sp/train/feats_stats.npz
377
+ preencoder: null
378
+ preencoder_conf: {}
379
+ encoder: vgg_rnn
380
+ encoder_conf:
381
+ rnn_type: lstm
382
+ bidirectional: true
383
+ use_projection: true
384
+ num_layers: 4
385
+ hidden_size: 1024
386
+ output_size: 1024
387
+ postencoder: null
388
+ postencoder_conf: {}
389
+ decoder: rnn
390
+ decoder_conf:
391
+ num_layers: 2
392
+ hidden_size: 1024
393
+ sampling_probability: 0
394
+ att_conf:
395
+ atype: location
396
+ adim: 1024
397
+ aconv_chans: 10
398
+ aconv_filts: 100
399
+ required:
400
+ - output_dir
401
+ - token_list
402
+ version: 0.10.6a1
403
+ distributed: false
404
+ ```
405
+
406
+ </details>
407
+
408
+
409
+
410
+ ### Citing ESPnet
411
+
412
+ ```BibTex
413
+ @inproceedings{watanabe2018espnet,
414
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
415
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
416
+ year={2018},
417
+ booktitle={Proceedings of Interspeech},
418
+ pages={2207--2211},
419
+ doi={10.21437/Interspeech.2018-1456},
420
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
421
+ }
422
+
423
+
424
+
425
+
426
+ ```
427
+
428
+ or arXiv:
429
+
430
+ ```bibtex
431
+ @misc{watanabe2018espnet,
432
+ title={ESPnet: End-to-End Speech Processing Toolkit},
433
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
434
+ year={2018},
435
+ eprint={1804.00015},
436
+ archivePrefix={arXiv},
437
+ primaryClass={cs.CL}
438
+ }
439
+ ```
data/fa_token_list/bpe_unigram150/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37973014e2648943c8039425244b075496461c86d8217375350fb35a79e2ddd5
3
+ size 239472
exp/asr_stats_raw_fa_bpe150_sp/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
 
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/6epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f730498097d77d754ce1111aab0a9218f57f277e98b05b8d3e23912b2d4ce91b
3
+ size 447981362
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/RESULTS.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon May 2 11:48:56 EDT 2022`
5
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.6a1`
7
+ - pytorch version: `pytorch 1.8.1+cu102`
8
+ - Git hash: `716eb8f92e19708acfd08ba3bd39d40890d3a84b`
9
+ - Commit date: `Thu Apr 28 19:50:59 2022 -0400`
10
+
11
+ ## asr_train_asr_rnn_raw_fa_bpe150_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_rnn_asr_model_valid.acc.ave/test_fa|9728|68904|0.0|0.0|100.0|0.0|100.0|100.0|
17
+ |decode_rnn_asr_model_valid.acc.best/test_fa|9728|68904|91.4|7.2|1.4|1.0|9.5|30.1|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_rnn_asr_model_valid.acc.ave/test_fa|9728|331506|0.0|0.0|100.0|0.0|100.0|100.0|
24
+ |decode_rnn_asr_model_valid.acc.best/test_fa|9728|331506|97.2|1.3|1.5|0.7|3.6|30.1|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
30
+ |decode_rnn_asr_model_valid.acc.ave/test_fa|9728|230963|0.0|0.0|100.0|0.0|100.0|100.0|
31
+ |decode_rnn_asr_model_valid.acc.best/test_fa|9728|230963|95.9|2.4|1.6|0.7|4.7|30.1|
32
+
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/config.yaml ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_rnn.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_rnn_raw_fa_bpe150_sp
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 15
28
+ patience: 3
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - loss
39
+ - min
40
+ - - valid
41
+ - loss
42
+ - min
43
+ - - train
44
+ - acc
45
+ - max
46
+ - - valid
47
+ - acc
48
+ - max
49
+ keep_nbest_models:
50
+ - 10
51
+ nbest_averaging_interval: 0
52
+ grad_clip: 5.0
53
+ grad_clip_type: 2.0
54
+ grad_noise: false
55
+ accum_grad: 1
56
+ no_forward_run: false
57
+ resume: true
58
+ train_dtype: float32
59
+ use_amp: false
60
+ log_interval: null
61
+ use_matplotlib: true
62
+ use_tensorboard: true
63
+ use_wandb: false
64
+ wandb_project: null
65
+ wandb_id: null
66
+ wandb_entity: null
67
+ wandb_name: null
68
+ wandb_model_log_interval: -1
69
+ detect_anomaly: false
70
+ pretrain_path: null
71
+ init_param: []
72
+ ignore_init_mismatch: false
73
+ freeze_param: []
74
+ num_iters_per_epoch: null
75
+ batch_size: 30
76
+ valid_batch_size: null
77
+ batch_bins: 1000000
78
+ valid_batch_bins: null
79
+ train_shape_file:
80
+ - exp/asr_stats_raw_fa_bpe150_sp/train/speech_shape
81
+ - exp/asr_stats_raw_fa_bpe150_sp/train/text_shape.bpe
82
+ valid_shape_file:
83
+ - exp/asr_stats_raw_fa_bpe150_sp/valid/speech_shape
84
+ - exp/asr_stats_raw_fa_bpe150_sp/valid/text_shape.bpe
85
+ batch_type: folded
86
+ valid_batch_type: null
87
+ fold_length:
88
+ - 80000
89
+ - 150
90
+ sort_in_batch: descending
91
+ sort_batch: descending
92
+ multiple_iterator: false
93
+ chunk_length: 500
94
+ chunk_shift_ratio: 0.5
95
+ num_cache_chunks: 1024
96
+ train_data_path_and_name_and_type:
97
+ - - dump/raw/train_fa_sp/wav.scp
98
+ - speech
99
+ - sound
100
+ - - dump/raw/train_fa_sp/text
101
+ - text
102
+ - text
103
+ valid_data_path_and_name_and_type:
104
+ - - dump/raw/dev_fa/wav.scp
105
+ - speech
106
+ - sound
107
+ - - dump/raw/dev_fa/text
108
+ - text
109
+ - text
110
+ allow_variable_data_keys: false
111
+ max_cache_size: 0.0
112
+ max_cache_fd: 32
113
+ valid_max_cache_size: null
114
+ optim: adadelta
115
+ optim_conf:
116
+ lr: 0.1
117
+ scheduler: null
118
+ scheduler_conf: {}
119
+ token_list:
120
+ - <blank>
121
+ - <unk>
122
+ - ی
123
+ - ا
124
+ - ه
125
+ - ▁
126
+ - ر
127
+ - م
128
+ - و
129
+ - د
130
+ - ت
131
+ - ش
132
+ - ن
133
+ - ل
134
+ - ▁ب
135
+ - ز
136
+ - ب
137
+ - .
138
+ - ▁م
139
+ - ان
140
+ - ▁ا
141
+ - س
142
+ - ک
143
+ - ▁می
144
+ - گ
145
+ - ف
146
+ - ▁د
147
+ - ؟
148
+ - ق
149
+ - ▁و
150
+ - ید
151
+ - ▁ن
152
+ - ند
153
+ - ست
154
+ - ار
155
+ - ▁چ
156
+ - ع
157
+ - ج
158
+ - ▁ت
159
+ - ▁ک
160
+ - ▁با
161
+ - خ
162
+ - ون
163
+ - ▁پ
164
+ - ▁به
165
+ - ▁من
166
+ - ▁س
167
+ - ▁را
168
+ - ،
169
+ - ▁خ
170
+ - ▁این
171
+ - ▁کن
172
+ - ▁آ
173
+ - ▁در
174
+ - ای
175
+ - ▁از
176
+ - اد
177
+ - ▁است
178
+ - ح
179
+ - ص
180
+ - ▁ش
181
+ - ط
182
+ - ▁تو
183
+ - ین
184
+ - ▁دار
185
+ - ▁که
186
+ - ال
187
+ - ▁رو
188
+ - ▁گ
189
+ - ▁ج
190
+ - ور
191
+ - ام
192
+ - ▁هم
193
+ - ▁ح
194
+ - فت
195
+ - رد
196
+ - یم
197
+ - پ
198
+ - غ
199
+ - چ
200
+ - ذ
201
+ - ض
202
+ - ظ
203
+ - '!'
204
+ - ث
205
+ - ً
206
+ - ئ
207
+ - '"'
208
+ - ژ
209
+ - ك
210
+ - آ
211
+ - ي
212
+ - ':'
213
+ - ى
214
+ - '-'
215
+ - ِ
216
+ - أ
217
+ - َ
218
+ - »
219
+ - ـ
220
+ - ','
221
+ - ُ
222
+ - (
223
+ - )
224
+ - ء
225
+ - ٔ
226
+ - ٬
227
+ - ّ
228
+ - ؛
229
+ - B
230
+ - C
231
+ - A
232
+ - E
233
+ - G
234
+ - M
235
+ - S
236
+ - ؤ
237
+ - I
238
+ - ;
239
+ - T
240
+ - H
241
+ - _
242
+ - F
243
+ - D
244
+ - ۀ
245
+ - Y
246
+ - N
247
+ - K
248
+ - U
249
+ - –
250
+ - ٌ
251
+ - P
252
+ - O
253
+ - Q
254
+ - Z
255
+ - '&'
256
+ - L
257
+ - R
258
+ - ة
259
+ - X
260
+ - ā
261
+ - '#'
262
+ - “
263
+ - '='
264
+ - «
265
+ - š
266
+ - ْ
267
+ - ے
268
+ - ”
269
+ - <sos/eos>
270
+ init: null
271
+ input_size: null
272
+ ctc_conf:
273
+ dropout_rate: 0.0
274
+ ctc_type: builtin
275
+ reduce: true
276
+ ignore_nan_grad: true
277
+ joint_net_conf: null
278
+ model_conf:
279
+ ctc_weight: 0.5
280
+ use_preprocessor: true
281
+ token_type: bpe
282
+ bpemodel: data/fa_token_list/bpe_unigram150/bpe.model
283
+ non_linguistic_symbols: null
284
+ cleaner: null
285
+ g2p: null
286
+ speech_volume_normalize: null
287
+ rir_scp: null
288
+ rir_apply_prob: 1.0
289
+ noise_scp: null
290
+ noise_apply_prob: 1.0
291
+ noise_db_range: '13_15'
292
+ frontend: default
293
+ frontend_conf:
294
+ fs: 16k
295
+ specaug: specaug
296
+ specaug_conf:
297
+ apply_time_warp: true
298
+ time_warp_window: 5
299
+ time_warp_mode: bicubic
300
+ apply_freq_mask: true
301
+ freq_mask_width_range:
302
+ - 0
303
+ - 27
304
+ num_freq_mask: 2
305
+ apply_time_mask: true
306
+ time_mask_width_ratio_range:
307
+ - 0.0
308
+ - 0.05
309
+ num_time_mask: 2
310
+ normalize: global_mvn
311
+ normalize_conf:
312
+ stats_file: exp/asr_stats_raw_fa_bpe150_sp/train/feats_stats.npz
313
+ preencoder: null
314
+ preencoder_conf: {}
315
+ encoder: vgg_rnn
316
+ encoder_conf:
317
+ rnn_type: lstm
318
+ bidirectional: true
319
+ use_projection: true
320
+ num_layers: 4
321
+ hidden_size: 1024
322
+ output_size: 1024
323
+ postencoder: null
324
+ postencoder_conf: {}
325
+ decoder: rnn
326
+ decoder_conf:
327
+ num_layers: 2
328
+ hidden_size: 1024
329
+ sampling_probability: 0
330
+ att_conf:
331
+ atype: location
332
+ adim: 1024
333
+ aconv_chans: 10
334
+ aconv_filts: 100
335
+ required:
336
+ - output_dir
337
+ - token_list
338
+ version: 0.10.6a1
339
+ distributed: false
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/acc.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/backward_time.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/cer.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/forward_time.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/iter_time.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/loss.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/loss_att.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/train_time.png ADDED
exp/asr_train_asr_rnn_raw_fa_bpe150_sp/images/wer.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.6a1
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_rnn_raw_fa_bpe150_sp/6epoch.pth
4
+ python: "3.9.5 (default, Jun 4 2021, 12:28:51) \n[GCC 7.5.0]"
5
+ timestamp: 1651506588.805354
6
+ torch: 1.8.1+cu102
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_rnn_raw_fa_bpe150_sp/config.yaml