Automatic Speech Recognition
ESPnet
Tamil
audio
dzeinali commited on
Commit
bc1afeb
1 Parent(s): 91e5efc

Update model

Browse files
README.md ADDED
@@ -0,0 +1,436 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: ta
7
+ datasets:
8
+ - commonvoice
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/tamil_commonvoice_blstm`
15
+
16
+ This model was trained by dzeinali using commonvoice recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout 716eb8f92e19708acfd08ba3bd39d40890d3a84b
23
+ pip install -e .
24
+ cd egs2/commonvoice/asr1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/tamil_commonvoice_blstm
26
+ ```
27
+
28
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Mon May 2 11:41:47 EDT 2022`
32
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.6a1`
34
+ - pytorch version: `pytorch 1.8.1+cu102`
35
+ - Git hash: `716eb8f92e19708acfd08ba3bd39d40890d3a84b`
36
+ - Commit date: `Thu Apr 28 19:50:59 2022 -0400`
37
+
38
+ ## asr_train_asr_rnn_raw_ta_bpe150_sp
39
+ ### WER
40
+
41
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
42
+ |---|---|---|---|---|---|---|---|---|
43
+ |decode_rnn_asr_model_valid.acc.ave/test_ta|11499|72228|66.0|30.5|3.5|3.2|37.2|79.7|
44
+
45
+ ### CER
46
+
47
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
48
+ |---|---|---|---|---|---|---|---|---|
49
+ |decode_rnn_asr_model_valid.acc.ave/test_ta|11499|638106|93.5|3.8|2.7|1.8|8.3|79.9|
50
+
51
+ ### TER
52
+
53
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
54
+ |---|---|---|---|---|---|---|---|---|
55
+ |decode_rnn_asr_model_valid.acc.ave/test_ta|11499|422957|89.8|7.0|3.2|1.8|12.0|79.8|
56
+
57
+ ## ASR config
58
+
59
+ <details><summary>expand</summary>
60
+
61
+ ```
62
+ config: conf/tuning/train_asr_rnn.yaml
63
+ print_config: false
64
+ log_level: INFO
65
+ dry_run: false
66
+ iterator_type: sequence
67
+ output_dir: exp/asr_train_asr_rnn_raw_ta_bpe150_sp
68
+ ngpu: 1
69
+ seed: 0
70
+ num_workers: 1
71
+ num_att_plot: 3
72
+ dist_backend: nccl
73
+ dist_init_method: env://
74
+ dist_world_size: null
75
+ dist_rank: null
76
+ local_rank: 0
77
+ dist_master_addr: null
78
+ dist_master_port: null
79
+ dist_launcher: null
80
+ multiprocessing_distributed: false
81
+ unused_parameters: false
82
+ sharded_ddp: false
83
+ cudnn_enabled: true
84
+ cudnn_benchmark: false
85
+ cudnn_deterministic: true
86
+ collect_stats: false
87
+ write_collected_feats: false
88
+ max_epoch: 15
89
+ patience: 3
90
+ val_scheduler_criterion:
91
+ - valid
92
+ - loss
93
+ early_stopping_criterion:
94
+ - valid
95
+ - loss
96
+ - min
97
+ best_model_criterion:
98
+ - - train
99
+ - loss
100
+ - min
101
+ - - valid
102
+ - loss
103
+ - min
104
+ - - train
105
+ - acc
106
+ - max
107
+ - - valid
108
+ - acc
109
+ - max
110
+ keep_nbest_models:
111
+ - 10
112
+ nbest_averaging_interval: 0
113
+ grad_clip: 5.0
114
+ grad_clip_type: 2.0
115
+ grad_noise: false
116
+ accum_grad: 1
117
+ no_forward_run: false
118
+ resume: true
119
+ train_dtype: float32
120
+ use_amp: false
121
+ log_interval: null
122
+ use_matplotlib: true
123
+ use_tensorboard: true
124
+ use_wandb: false
125
+ wandb_project: null
126
+ wandb_id: null
127
+ wandb_entity: null
128
+ wandb_name: null
129
+ wandb_model_log_interval: -1
130
+ detect_anomaly: false
131
+ pretrain_path: null
132
+ init_param: []
133
+ ignore_init_mismatch: false
134
+ freeze_param: []
135
+ num_iters_per_epoch: null
136
+ batch_size: 30
137
+ valid_batch_size: null
138
+ batch_bins: 1000000
139
+ valid_batch_bins: null
140
+ train_shape_file:
141
+ - exp/asr_stats_raw_ta_bpe150_sp/train/speech_shape
142
+ - exp/asr_stats_raw_ta_bpe150_sp/train/text_shape.bpe
143
+ valid_shape_file:
144
+ - exp/asr_stats_raw_ta_bpe150_sp/valid/speech_shape
145
+ - exp/asr_stats_raw_ta_bpe150_sp/valid/text_shape.bpe
146
+ batch_type: folded
147
+ valid_batch_type: null
148
+ fold_length:
149
+ - 80000
150
+ - 150
151
+ sort_in_batch: descending
152
+ sort_batch: descending
153
+ multiple_iterator: false
154
+ chunk_length: 500
155
+ chunk_shift_ratio: 0.5
156
+ num_cache_chunks: 1024
157
+ train_data_path_and_name_and_type:
158
+ - - dump/raw/train_ta_sp/wav.scp
159
+ - speech
160
+ - sound
161
+ - - dump/raw/train_ta_sp/text
162
+ - text
163
+ - text
164
+ valid_data_path_and_name_and_type:
165
+ - - dump/raw/dev_ta/wav.scp
166
+ - speech
167
+ - sound
168
+ - - dump/raw/dev_ta/text
169
+ - text
170
+ - text
171
+ allow_variable_data_keys: false
172
+ max_cache_size: 0.0
173
+ max_cache_fd: 32
174
+ valid_max_cache_size: null
175
+ optim: adadelta
176
+ optim_conf:
177
+ lr: 0.1
178
+ scheduler: null
179
+ scheduler_conf: {}
180
+ token_list:
181
+ - <blank>
182
+ - <unk>
183
+ - ி
184
+ - ு
185
+ - ா
186
+ - வ
187
+ - ை
188
+ - ர
189
+ - ன
190
+ - ▁ப
191
+ - .
192
+ - ▁க
193
+ - ்
194
+ - ▁அ
195
+ - ட
196
+ - த
197
+ - க
198
+ - ே
199
+ - ம
200
+ - ல
201
+ - ம்
202
+ - ன்
203
+ - ும்
204
+ - ய
205
+ - ▁வ
206
+ - க்க
207
+ - ▁இ
208
+ - ▁த
209
+ - த்த
210
+ - ▁
211
+ - து
212
+ - ந்த
213
+ - ப
214
+ - ▁ச
215
+ - ிய
216
+ - ▁ம
217
+ - ோ
218
+ - ெ
219
+ - ர்
220
+ - ரு
221
+ - ழ
222
+ - ப்ப
223
+ - ண
224
+ - ொ
225
+ - ▁ந
226
+ - ட்ட
227
+ - ▁எ
228
+ - ற
229
+ - ைய
230
+ - ச
231
+ - ள
232
+ - க்
233
+ - ில்
234
+ - ங்க
235
+ - ','
236
+ - ண்ட
237
+ - ▁உ
238
+ - ன்ற
239
+ - ார்
240
+ - ப்
241
+ - ூ
242
+ - ல்
243
+ - ள்
244
+ - கள
245
+ - கள்
246
+ - ாக
247
+ - ற்ற
248
+ - டு
249
+ - ீ
250
+ - ந
251
+ - '!'
252
+ - '?'
253
+ - '"'
254
+ - ஏ
255
+ - ஸ
256
+ - ஞ
257
+ - ஷ
258
+ - ஜ
259
+ - ஓ
260
+ - '-'
261
+ - ஐ
262
+ - ஹ
263
+ - A
264
+ - E
265
+ - ங
266
+ - R
267
+ - N
268
+ - ஈ
269
+ - ஃ
270
+ - O
271
+ - I
272
+ - ;
273
+ - S
274
+ - T
275
+ - L
276
+ - எ
277
+ - இ
278
+ - அ
279
+ - H
280
+ - C
281
+ - D
282
+ - M
283
+ - U
284
+ - உ
285
+ - B
286
+ - G
287
+ - P
288
+ - Y
289
+ - ''''
290
+ - ௌ
291
+ - K
292
+ - ':'
293
+ - W
294
+ - ஆ
295
+ - F
296
+ - —
297
+ - V
298
+ - ”
299
+ - J
300
+ - Z
301
+ - ’
302
+ - ‘
303
+ - X
304
+ - Q
305
+ - (
306
+ - )
307
+ - ·
308
+ - –
309
+ - ⁄
310
+ - '3'
311
+ - '4'
312
+ - ◯
313
+ - _
314
+ - '&'
315
+ - ௗ
316
+ - •
317
+ - '`'
318
+ - ஔ
319
+ - “
320
+ - ஊ
321
+ - š
322
+ - ഥ
323
+ - '1'
324
+ - '2'
325
+ - á
326
+ - ‚
327
+ - é
328
+ - ô
329
+ - ஒ
330
+ - <sos/eos>
331
+ init: null
332
+ input_size: null
333
+ ctc_conf:
334
+ dropout_rate: 0.0
335
+ ctc_type: builtin
336
+ reduce: true
337
+ ignore_nan_grad: true
338
+ joint_net_conf: null
339
+ model_conf:
340
+ ctc_weight: 0.5
341
+ use_preprocessor: true
342
+ token_type: bpe
343
+ bpemodel: data/ta_token_list/bpe_unigram150/bpe.model
344
+ non_linguistic_symbols: null
345
+ cleaner: null
346
+ g2p: null
347
+ speech_volume_normalize: null
348
+ rir_scp: null
349
+ rir_apply_prob: 1.0
350
+ noise_scp: null
351
+ noise_apply_prob: 1.0
352
+ noise_db_range: '13_15'
353
+ frontend: default
354
+ frontend_conf:
355
+ fs: 16k
356
+ specaug: specaug
357
+ specaug_conf:
358
+ apply_time_warp: true
359
+ time_warp_window: 5
360
+ time_warp_mode: bicubic
361
+ apply_freq_mask: true
362
+ freq_mask_width_range:
363
+ - 0
364
+ - 27
365
+ num_freq_mask: 2
366
+ apply_time_mask: true
367
+ time_mask_width_ratio_range:
368
+ - 0.0
369
+ - 0.05
370
+ num_time_mask: 2
371
+ normalize: global_mvn
372
+ normalize_conf:
373
+ stats_file: exp/asr_stats_raw_ta_bpe150_sp/train/feats_stats.npz
374
+ preencoder: null
375
+ preencoder_conf: {}
376
+ encoder: vgg_rnn
377
+ encoder_conf:
378
+ rnn_type: lstm
379
+ bidirectional: true
380
+ use_projection: true
381
+ num_layers: 4
382
+ hidden_size: 1024
383
+ output_size: 1024
384
+ postencoder: null
385
+ postencoder_conf: {}
386
+ decoder: rnn
387
+ decoder_conf:
388
+ num_layers: 2
389
+ hidden_size: 1024
390
+ sampling_probability: 0
391
+ att_conf:
392
+ atype: location
393
+ adim: 1024
394
+ aconv_chans: 10
395
+ aconv_filts: 100
396
+ required:
397
+ - output_dir
398
+ - token_list
399
+ version: 0.10.6a1
400
+ distributed: false
401
+ ```
402
+
403
+ </details>
404
+
405
+
406
+
407
+ ### Citing ESPnet
408
+
409
+ ```BibTex
410
+ @inproceedings{watanabe2018espnet,
411
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
412
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
413
+ year={2018},
414
+ booktitle={Proceedings of Interspeech},
415
+ pages={2207--2211},
416
+ doi={10.21437/Interspeech.2018-1456},
417
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
418
+ }
419
+
420
+
421
+
422
+
423
+ ```
424
+
425
+ or arXiv:
426
+
427
+ ```bibtex
428
+ @misc{watanabe2018espnet,
429
+ title={ESPnet: End-to-End Speech Processing Toolkit},
430
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
431
+ year={2018},
432
+ eprint={1804.00015},
433
+ archivePrefix={arXiv},
434
+ primaryClass={cs.CL}
435
+ }
436
+ ```
data/ta_token_list/bpe_unigram150/bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cfdadfef65143a17363e109504aa3c089bfd2f428ac82f692f9ef2c7d1ff09a
3
+ size 239549
exp/asr_stats_raw_ta_bpe150_sp/train/feats_stats.npz ADDED
Binary file (1.4 kB). View file
 
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/RESULTS.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Mon May 2 11:41:47 EDT 2022`
5
+ - python version: `3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.6a1`
7
+ - pytorch version: `pytorch 1.8.1+cu102`
8
+ - Git hash: `716eb8f92e19708acfd08ba3bd39d40890d3a84b`
9
+ - Commit date: `Thu Apr 28 19:50:59 2022 -0400`
10
+
11
+ ## asr_train_asr_rnn_raw_ta_bpe150_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_rnn_asr_model_valid.acc.ave/test_ta|11499|72228|66.0|30.5|3.5|3.2|37.2|79.7|
17
+
18
+ ### CER
19
+
20
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
21
+ |---|---|---|---|---|---|---|---|---|
22
+ |decode_rnn_asr_model_valid.acc.ave/test_ta|11499|638106|93.5|3.8|2.7|1.8|8.3|79.9|
23
+
24
+ ### TER
25
+
26
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
27
+ |---|---|---|---|---|---|---|---|---|
28
+ |decode_rnn_asr_model_valid.acc.ave/test_ta|11499|422957|89.8|7.0|3.2|1.8|12.0|79.8|
29
+
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/config.yaml ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_asr_rnn.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: sequence
6
+ output_dir: exp/asr_train_asr_rnn_raw_ta_bpe150_sp
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 1
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 15
28
+ patience: 3
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - train
38
+ - loss
39
+ - min
40
+ - - valid
41
+ - loss
42
+ - min
43
+ - - train
44
+ - acc
45
+ - max
46
+ - - valid
47
+ - acc
48
+ - max
49
+ keep_nbest_models:
50
+ - 10
51
+ nbest_averaging_interval: 0
52
+ grad_clip: 5.0
53
+ grad_clip_type: 2.0
54
+ grad_noise: false
55
+ accum_grad: 1
56
+ no_forward_run: false
57
+ resume: true
58
+ train_dtype: float32
59
+ use_amp: false
60
+ log_interval: null
61
+ use_matplotlib: true
62
+ use_tensorboard: true
63
+ use_wandb: false
64
+ wandb_project: null
65
+ wandb_id: null
66
+ wandb_entity: null
67
+ wandb_name: null
68
+ wandb_model_log_interval: -1
69
+ detect_anomaly: false
70
+ pretrain_path: null
71
+ init_param: []
72
+ ignore_init_mismatch: false
73
+ freeze_param: []
74
+ num_iters_per_epoch: null
75
+ batch_size: 30
76
+ valid_batch_size: null
77
+ batch_bins: 1000000
78
+ valid_batch_bins: null
79
+ train_shape_file:
80
+ - exp/asr_stats_raw_ta_bpe150_sp/train/speech_shape
81
+ - exp/asr_stats_raw_ta_bpe150_sp/train/text_shape.bpe
82
+ valid_shape_file:
83
+ - exp/asr_stats_raw_ta_bpe150_sp/valid/speech_shape
84
+ - exp/asr_stats_raw_ta_bpe150_sp/valid/text_shape.bpe
85
+ batch_type: folded
86
+ valid_batch_type: null
87
+ fold_length:
88
+ - 80000
89
+ - 150
90
+ sort_in_batch: descending
91
+ sort_batch: descending
92
+ multiple_iterator: false
93
+ chunk_length: 500
94
+ chunk_shift_ratio: 0.5
95
+ num_cache_chunks: 1024
96
+ train_data_path_and_name_and_type:
97
+ - - dump/raw/train_ta_sp/wav.scp
98
+ - speech
99
+ - sound
100
+ - - dump/raw/train_ta_sp/text
101
+ - text
102
+ - text
103
+ valid_data_path_and_name_and_type:
104
+ - - dump/raw/dev_ta/wav.scp
105
+ - speech
106
+ - sound
107
+ - - dump/raw/dev_ta/text
108
+ - text
109
+ - text
110
+ allow_variable_data_keys: false
111
+ max_cache_size: 0.0
112
+ max_cache_fd: 32
113
+ valid_max_cache_size: null
114
+ optim: adadelta
115
+ optim_conf:
116
+ lr: 0.1
117
+ scheduler: null
118
+ scheduler_conf: {}
119
+ token_list:
120
+ - <blank>
121
+ - <unk>
122
+ - ி
123
+ - ு
124
+ - ா
125
+ - வ
126
+ - ை
127
+ - ர
128
+ - ன
129
+ - ▁ப
130
+ - .
131
+ - ▁க
132
+ - ்
133
+ - ▁அ
134
+ - ட
135
+ - த
136
+ - க
137
+ - ே
138
+ - ம
139
+ - ல
140
+ - ம்
141
+ - ன்
142
+ - ும்
143
+ - ய
144
+ - ▁வ
145
+ - க்க
146
+ - ▁இ
147
+ - ▁த
148
+ - த்த
149
+ - ▁
150
+ - து
151
+ - ந்த
152
+ - ப
153
+ - ▁ச
154
+ - ிய
155
+ - ▁ம
156
+ - ோ
157
+ - ெ
158
+ - ர்
159
+ - ரு
160
+ - ழ
161
+ - ப்ப
162
+ - ண
163
+ - ொ
164
+ - ▁ந
165
+ - ட்ட
166
+ - ▁எ
167
+ - ற
168
+ - ைய
169
+ - ச
170
+ - ள
171
+ - க்
172
+ - ில்
173
+ - ங்க
174
+ - ','
175
+ - ண்ட
176
+ - ▁உ
177
+ - ன்ற
178
+ - ார்
179
+ - ப்
180
+ - ூ
181
+ - ல்
182
+ - ள்
183
+ - கள
184
+ - கள்
185
+ - ாக
186
+ - ற்ற
187
+ - டு
188
+ - ீ
189
+ - ந
190
+ - '!'
191
+ - '?'
192
+ - '"'
193
+ - ஏ
194
+ - ஸ
195
+ - ஞ
196
+ - ஷ
197
+ - ஜ
198
+ - ஓ
199
+ - '-'
200
+ - ஐ
201
+ - ஹ
202
+ - A
203
+ - E
204
+ - ங
205
+ - R
206
+ - N
207
+ - ஈ
208
+ - ஃ
209
+ - O
210
+ - I
211
+ - ;
212
+ - S
213
+ - T
214
+ - L
215
+ - எ
216
+ - இ
217
+ - அ
218
+ - H
219
+ - C
220
+ - D
221
+ - M
222
+ - U
223
+ - உ
224
+ - B
225
+ - G
226
+ - P
227
+ - Y
228
+ - ''''
229
+ - ௌ
230
+ - K
231
+ - ':'
232
+ - W
233
+ - ஆ
234
+ - F
235
+ - —
236
+ - V
237
+ - ”
238
+ - J
239
+ - Z
240
+ - ’
241
+ - ‘
242
+ - X
243
+ - Q
244
+ - (
245
+ - )
246
+ - ·
247
+ - –
248
+ - ⁄
249
+ - '3'
250
+ - '4'
251
+ - ◯
252
+ - _
253
+ - '&'
254
+ - ௗ
255
+ - •
256
+ - '`'
257
+ - ஔ
258
+ - “
259
+ - ஊ
260
+ - š
261
+ - ഥ
262
+ - '1'
263
+ - '2'
264
+ - á
265
+ - ‚
266
+ - é
267
+ - ô
268
+ - ஒ
269
+ - <sos/eos>
270
+ init: null
271
+ input_size: null
272
+ ctc_conf:
273
+ dropout_rate: 0.0
274
+ ctc_type: builtin
275
+ reduce: true
276
+ ignore_nan_grad: true
277
+ joint_net_conf: null
278
+ model_conf:
279
+ ctc_weight: 0.5
280
+ use_preprocessor: true
281
+ token_type: bpe
282
+ bpemodel: data/ta_token_list/bpe_unigram150/bpe.model
283
+ non_linguistic_symbols: null
284
+ cleaner: null
285
+ g2p: null
286
+ speech_volume_normalize: null
287
+ rir_scp: null
288
+ rir_apply_prob: 1.0
289
+ noise_scp: null
290
+ noise_apply_prob: 1.0
291
+ noise_db_range: '13_15'
292
+ frontend: default
293
+ frontend_conf:
294
+ fs: 16k
295
+ specaug: specaug
296
+ specaug_conf:
297
+ apply_time_warp: true
298
+ time_warp_window: 5
299
+ time_warp_mode: bicubic
300
+ apply_freq_mask: true
301
+ freq_mask_width_range:
302
+ - 0
303
+ - 27
304
+ num_freq_mask: 2
305
+ apply_time_mask: true
306
+ time_mask_width_ratio_range:
307
+ - 0.0
308
+ - 0.05
309
+ num_time_mask: 2
310
+ normalize: global_mvn
311
+ normalize_conf:
312
+ stats_file: exp/asr_stats_raw_ta_bpe150_sp/train/feats_stats.npz
313
+ preencoder: null
314
+ preencoder_conf: {}
315
+ encoder: vgg_rnn
316
+ encoder_conf:
317
+ rnn_type: lstm
318
+ bidirectional: true
319
+ use_projection: true
320
+ num_layers: 4
321
+ hidden_size: 1024
322
+ output_size: 1024
323
+ postencoder: null
324
+ postencoder_conf: {}
325
+ decoder: rnn
326
+ decoder_conf:
327
+ num_layers: 2
328
+ hidden_size: 1024
329
+ sampling_probability: 0
330
+ att_conf:
331
+ atype: location
332
+ adim: 1024
333
+ aconv_chans: 10
334
+ aconv_filts: 100
335
+ required:
336
+ - output_dir
337
+ - token_list
338
+ version: 0.10.6a1
339
+ distributed: false
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/acc.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/backward_time.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/cer.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/cer_ctc.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/forward_time.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/iter_time.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/loss.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/loss_att.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/loss_ctc.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/optim0_lr0.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/optim_step_time.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/train_time.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/images/wer.png ADDED
exp/asr_train_asr_rnn_raw_ta_bpe150_sp/valid.acc.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:258d079fea3fe9e6713c98f7ea2328191f5a73334b07a41d1b4932f99b6f213c
3
+ size 447985902
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.6a1
2
+ files:
3
+ asr_model_file: exp/asr_train_asr_rnn_raw_ta_bpe150_sp/valid.acc.ave_10best.pth
4
+ python: "3.9.5 (default, Jun 4 2021, 12:28:51) \n[GCC 7.5.0]"
5
+ timestamp: 1651506338.672178
6
+ torch: 1.8.1+cu102
7
+ yaml_files:
8
+ asr_train_config: exp/asr_train_asr_rnn_raw_ta_bpe150_sp/config.yaml