add model files
Browse files- README.md +931 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/RESULTS.md +29 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/config.yaml +815 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/acc.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/backward_time.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer_ctc.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/forward_time.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/gpu_max_cached_mem_GB.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/iter_time.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_att.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_ctc.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim0_lr0.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim_step_time.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/train_time.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/wer.png +0 -0
- exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/valid.acc.ave_10best.pth +3 -0
- meta.yaml +8 -0
- score.log +46 -0
README.md
ADDED
@@ -0,0 +1,931 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- espnet
|
4 |
+
- audio
|
5 |
+
- automatic-speech-recognition
|
6 |
+
language: en
|
7 |
+
datasets:
|
8 |
+
- slurp_entity
|
9 |
+
license: cc-by-4.0
|
10 |
+
---
|
11 |
+
|
12 |
+
## ESPnet2 ASR model
|
13 |
+
|
14 |
+
### `pyf98/slurp_entity_e_branchformer`
|
15 |
+
|
16 |
+
This model was trained by Yifan Peng using slurp_entity recipe in [espnet](https://github.com/espnet/espnet/).
|
17 |
+
|
18 |
+
References:
|
19 |
+
- [E-Branchformer: Branchformer with Enhanced merging for speech recognition (SLT 2022)](https://arxiv.org/abs/2210.00077)
|
20 |
+
- [Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding (ICML 2022)](https://proceedings.mlr.press/v162/peng22a.html)
|
21 |
+
|
22 |
+
### Demo: How to use in ESPnet2
|
23 |
+
|
24 |
+
Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
|
25 |
+
if you haven't done that already.
|
26 |
+
|
27 |
+
```bash
|
28 |
+
cd espnet
|
29 |
+
git checkout 4bbd29a40cc7e2259996d30c0c76d3d789c1153d
|
30 |
+
pip install -e .
|
31 |
+
cd egs2/slurp_entity/asr1
|
32 |
+
./run.sh --skip_data_prep false --skip_train true --download_model pyf98/slurp_entity_e_branchformer
|
33 |
+
```
|
34 |
+
|
35 |
+
<!-- Generated by scripts/utils/show_asr_result.sh -->
|
36 |
+
# RESULTS
|
37 |
+
## Environments
|
38 |
+
- date: `Mon Feb 27 19:14:30 CST 2023`
|
39 |
+
- python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
|
40 |
+
- espnet version: `espnet 202301`
|
41 |
+
- pytorch version: `pytorch 1.13.1`
|
42 |
+
- Git hash: `4bbd29a40cc7e2259996d30c0c76d3d789c1153d`
|
43 |
+
- Commit date: `Sat Feb 25 21:54:03 2023 -0600`
|
44 |
+
|
45 |
+
## exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
|
46 |
+
### WER
|
47 |
+
|
48 |
+
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|
49 |
+
|---|---|---|---|---|---|---|---|---|
|
50 |
+
|decode_asr_asr_model_valid.acc.ave_10best/devel|8690|178058|84.6|7.6|7.8|3.2|18.6|51.2|
|
51 |
+
|decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|83.7|7.7|8.6|3.0|19.3|49.7|
|
52 |
+
|
53 |
+
### CER
|
54 |
+
|
55 |
+
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|
56 |
+
|---|---|---|---|---|---|---|---|---|
|
57 |
+
|decode_asr_asr_model_valid.acc.ave_10best/devel|8690|847400|90.8|3.0|6.2|3.5|12.7|51.2|
|
58 |
+
|decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|89.7|3.1|7.2|3.4|13.6|49.7|
|
59 |
+
|
60 |
+
|
61 |
+
### Intent Classification
|
62 |
+
|
63 |
+
- Valid Intent Classification Result:
|
64 |
+
0.8781357882623706
|
65 |
+
- Test Intent Classification Result:
|
66 |
+
0.8743691695977979
|
67 |
+
|
68 |
+
### Entity
|
69 |
+
|
70 |
+
|Slu f1|Precision|Recall|F-Measure|
|
71 |
+
|:---:|:---:|:---:|:---:|
|
72 |
+
| test | 0.7940 | 0.7582 | 0.7757 |
|
73 |
+
|
74 |
+
|
75 |
+
|
76 |
+
## ASR config
|
77 |
+
|
78 |
+
<details><summary>expand</summary>
|
79 |
+
|
80 |
+
```
|
81 |
+
config: conf/tuning/train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop.yaml
|
82 |
+
print_config: false
|
83 |
+
log_level: INFO
|
84 |
+
dry_run: false
|
85 |
+
iterator_type: sequence
|
86 |
+
output_dir: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
|
87 |
+
ngpu: 1
|
88 |
+
seed: 0
|
89 |
+
num_workers: 1
|
90 |
+
num_att_plot: 3
|
91 |
+
dist_backend: nccl
|
92 |
+
dist_init_method: env://
|
93 |
+
dist_world_size: null
|
94 |
+
dist_rank: null
|
95 |
+
local_rank: 0
|
96 |
+
dist_master_addr: null
|
97 |
+
dist_master_port: null
|
98 |
+
dist_launcher: null
|
99 |
+
multiprocessing_distributed: false
|
100 |
+
unused_parameters: false
|
101 |
+
sharded_ddp: false
|
102 |
+
cudnn_enabled: true
|
103 |
+
cudnn_benchmark: false
|
104 |
+
cudnn_deterministic: true
|
105 |
+
collect_stats: false
|
106 |
+
write_collected_feats: false
|
107 |
+
max_epoch: 60
|
108 |
+
patience: null
|
109 |
+
val_scheduler_criterion:
|
110 |
+
- valid
|
111 |
+
- loss
|
112 |
+
early_stopping_criterion:
|
113 |
+
- valid
|
114 |
+
- loss
|
115 |
+
- min
|
116 |
+
best_model_criterion:
|
117 |
+
- - valid
|
118 |
+
- acc
|
119 |
+
- max
|
120 |
+
keep_nbest_models: 10
|
121 |
+
nbest_averaging_interval: 0
|
122 |
+
grad_clip: 5.0
|
123 |
+
grad_clip_type: 2.0
|
124 |
+
grad_noise: false
|
125 |
+
accum_grad: 1
|
126 |
+
no_forward_run: false
|
127 |
+
resume: true
|
128 |
+
train_dtype: float32
|
129 |
+
use_amp: false
|
130 |
+
log_interval: null
|
131 |
+
use_matplotlib: true
|
132 |
+
use_tensorboard: true
|
133 |
+
create_graph_in_tensorboard: false
|
134 |
+
use_wandb: false
|
135 |
+
wandb_project: null
|
136 |
+
wandb_id: null
|
137 |
+
wandb_entity: null
|
138 |
+
wandb_name: null
|
139 |
+
wandb_model_log_interval: -1
|
140 |
+
detect_anomaly: false
|
141 |
+
pretrain_path: null
|
142 |
+
init_param: []
|
143 |
+
ignore_init_mismatch: false
|
144 |
+
freeze_param: []
|
145 |
+
num_iters_per_epoch: null
|
146 |
+
batch_size: 64
|
147 |
+
valid_batch_size: null
|
148 |
+
batch_bins: 1000000
|
149 |
+
valid_batch_bins: null
|
150 |
+
train_shape_file:
|
151 |
+
- exp/asr_stats_raw_en_word/train/speech_shape
|
152 |
+
- exp/asr_stats_raw_en_word/train/text_shape.word
|
153 |
+
valid_shape_file:
|
154 |
+
- exp/asr_stats_raw_en_word/valid/speech_shape
|
155 |
+
- exp/asr_stats_raw_en_word/valid/text_shape.word
|
156 |
+
batch_type: folded
|
157 |
+
valid_batch_type: null
|
158 |
+
fold_length:
|
159 |
+
- 80000
|
160 |
+
- 150
|
161 |
+
sort_in_batch: descending
|
162 |
+
sort_batch: descending
|
163 |
+
multiple_iterator: false
|
164 |
+
chunk_length: 500
|
165 |
+
chunk_shift_ratio: 0.5
|
166 |
+
num_cache_chunks: 1024
|
167 |
+
train_data_path_and_name_and_type:
|
168 |
+
- - dump/raw/train/wav.scp
|
169 |
+
- speech
|
170 |
+
- kaldi_ark
|
171 |
+
- - dump/raw/train/text
|
172 |
+
- text
|
173 |
+
- text
|
174 |
+
valid_data_path_and_name_and_type:
|
175 |
+
- - dump/raw/devel/wav.scp
|
176 |
+
- speech
|
177 |
+
- kaldi_ark
|
178 |
+
- - dump/raw/devel/text
|
179 |
+
- text
|
180 |
+
- text
|
181 |
+
allow_variable_data_keys: false
|
182 |
+
max_cache_size: 0.0
|
183 |
+
max_cache_fd: 32
|
184 |
+
valid_max_cache_size: null
|
185 |
+
exclude_weight_decay: false
|
186 |
+
exclude_weight_decay_conf: {}
|
187 |
+
optim: adam
|
188 |
+
optim_conf:
|
189 |
+
lr: 0.001
|
190 |
+
weight_decay: 1.0e-06
|
191 |
+
scheduler: warmuplr
|
192 |
+
scheduler_conf:
|
193 |
+
warmup_steps: 35000
|
194 |
+
token_list:
|
195 |
+
- <blank>
|
196 |
+
- <unk>
|
197 |
+
- βSEP
|
198 |
+
- βFILL
|
199 |
+
- s
|
200 |
+
- βthe
|
201 |
+
- a
|
202 |
+
- βto
|
203 |
+
- βi
|
204 |
+
- βme
|
205 |
+
- e
|
206 |
+
- βs
|
207 |
+
- βa
|
208 |
+
- i
|
209 |
+
- βyou
|
210 |
+
- βwhat
|
211 |
+
- er
|
212 |
+
- ing
|
213 |
+
- u
|
214 |
+
- βis
|
215 |
+
- ''''
|
216 |
+
- o
|
217 |
+
- p
|
218 |
+
- βin
|
219 |
+
- βp
|
220 |
+
- y
|
221 |
+
- βmy
|
222 |
+
- βplease
|
223 |
+
- d
|
224 |
+
- c
|
225 |
+
- m
|
226 |
+
- βb
|
227 |
+
- l
|
228 |
+
- βm
|
229 |
+
- βc
|
230 |
+
- st
|
231 |
+
- date
|
232 |
+
- n
|
233 |
+
- βd
|
234 |
+
- le
|
235 |
+
- b
|
236 |
+
- βfor
|
237 |
+
- re
|
238 |
+
- t
|
239 |
+
- βon
|
240 |
+
- en
|
241 |
+
- h
|
242 |
+
- 'on'
|
243 |
+
- ar
|
244 |
+
- person
|
245 |
+
- βre
|
246 |
+
- βf
|
247 |
+
- βg
|
248 |
+
- βof
|
249 |
+
- an
|
250 |
+
- β
|
251 |
+
- g
|
252 |
+
- βtoday
|
253 |
+
- βt
|
254 |
+
- or
|
255 |
+
- βit
|
256 |
+
- βthis
|
257 |
+
- βh
|
258 |
+
- r
|
259 |
+
- f
|
260 |
+
- at
|
261 |
+
- ch
|
262 |
+
- ce
|
263 |
+
- place_name
|
264 |
+
- βemail
|
265 |
+
- βdo
|
266 |
+
- es
|
267 |
+
- ri
|
268 |
+
- βe
|
269 |
+
- βw
|
270 |
+
- ic
|
271 |
+
- in
|
272 |
+
- βthat
|
273 |
+
- event_name
|
274 |
+
- βplay
|
275 |
+
- βand
|
276 |
+
- al
|
277 |
+
- βn
|
278 |
+
- βcan
|
279 |
+
- email_query
|
280 |
+
- ve
|
281 |
+
- βnew
|
282 |
+
- day
|
283 |
+
- it
|
284 |
+
- ate
|
285 |
+
- βfrom
|
286 |
+
- βhave
|
287 |
+
- k
|
288 |
+
- time
|
289 |
+
- βam
|
290 |
+
- media_type
|
291 |
+
- email_sendemail
|
292 |
+
- ent
|
293 |
+
- βolly
|
294 |
+
- qa_factoid
|
295 |
+
- se
|
296 |
+
- v
|
297 |
+
- et
|
298 |
+
- ck
|
299 |
+
- βany
|
300 |
+
- calendar_set
|
301 |
+
- ly
|
302 |
+
- th
|
303 |
+
- βhow
|
304 |
+
- βmeeting
|
305 |
+
- ed
|
306 |
+
- βtell
|
307 |
+
- βst
|
308 |
+
- x
|
309 |
+
- ur
|
310 |
+
- ro
|
311 |
+
- βat
|
312 |
+
- nd
|
313 |
+
- βlist
|
314 |
+
- w
|
315 |
+
- βu
|
316 |
+
- ou
|
317 |
+
- βnot
|
318 |
+
- βabout
|
319 |
+
- βan
|
320 |
+
- βo
|
321 |
+
- general_negate
|
322 |
+
- ut
|
323 |
+
- βtime
|
324 |
+
- βbe
|
325 |
+
- βch
|
326 |
+
- βare
|
327 |
+
- social_post
|
328 |
+
- business_name
|
329 |
+
- la
|
330 |
+
- ty
|
331 |
+
- play_music
|
332 |
+
- ot
|
333 |
+
- general_quirky
|
334 |
+
- βl
|
335 |
+
- βsh
|
336 |
+
- βtweet
|
337 |
+
- om
|
338 |
+
- βweek
|
339 |
+
- um
|
340 |
+
- βone
|
341 |
+
- ter
|
342 |
+
- βhe
|
343 |
+
- βup
|
344 |
+
- βcom
|
345 |
+
- general_praise
|
346 |
+
- weather_query
|
347 |
+
- βnext
|
348 |
+
- βth
|
349 |
+
- βcheck
|
350 |
+
- calendar_query
|
351 |
+
- βlast
|
352 |
+
- βro
|
353 |
+
- ad
|
354 |
+
- is
|
355 |
+
- βwith
|
356 |
+
- ay
|
357 |
+
- βsend
|
358 |
+
- pe
|
359 |
+
- βpm
|
360 |
+
- βtomorrow
|
361 |
+
- βj
|
362 |
+
- un
|
363 |
+
- βtrain
|
364 |
+
- general_explain
|
365 |
+
- βv
|
366 |
+
- one
|
367 |
+
- βr
|
368 |
+
- ra
|
369 |
+
- news_query
|
370 |
+
- ation
|
371 |
+
- βemails
|
372 |
+
- us
|
373 |
+
- if
|
374 |
+
- ct
|
375 |
+
- βco
|
376 |
+
- βadd
|
377 |
+
- βwill
|
378 |
+
- βse
|
379 |
+
- nt
|
380 |
+
- βwas
|
381 |
+
- ine
|
382 |
+
- βde
|
383 |
+
- βset
|
384 |
+
- βex
|
385 |
+
- βwould
|
386 |
+
- ir
|
387 |
+
- ow
|
388 |
+
- ber
|
389 |
+
- general_repeat
|
390 |
+
- ight
|
391 |
+
- ook
|
392 |
+
- βagain
|
393 |
+
- βsong
|
394 |
+
- currency_name
|
395 |
+
- ll
|
396 |
+
- βha
|
397 |
+
- βgo
|
398 |
+
- relation
|
399 |
+
- te
|
400 |
+
- ion
|
401 |
+
- and
|
402 |
+
- βy
|
403 |
+
- βye
|
404 |
+
- general_affirm
|
405 |
+
- general_confirm
|
406 |
+
- ery
|
407 |
+
- βpo
|
408 |
+
- ff
|
409 |
+
- βwe
|
410 |
+
- βturn
|
411 |
+
- βdid
|
412 |
+
- βmar
|
413 |
+
- βalarm
|
414 |
+
- βlike
|
415 |
+
- datetime_query
|
416 |
+
- ers
|
417 |
+
- βall
|
418 |
+
- βremind
|
419 |
+
- βso
|
420 |
+
- qa_definition
|
421 |
+
- βcalendar
|
422 |
+
- end
|
423 |
+
- βsaid
|
424 |
+
- ci
|
425 |
+
- βoff
|
426 |
+
- βjohn
|
427 |
+
- βday
|
428 |
+
- ss
|
429 |
+
- pla
|
430 |
+
- ume
|
431 |
+
- βget
|
432 |
+
- ail
|
433 |
+
- pp
|
434 |
+
- z
|
435 |
+
- ry
|
436 |
+
- am
|
437 |
+
- βneed
|
438 |
+
- as
|
439 |
+
- βthank
|
440 |
+
- βwh
|
441 |
+
- βwant
|
442 |
+
- βright
|
443 |
+
- βjo
|
444 |
+
- βfacebook
|
445 |
+
- βk
|
446 |
+
- ge
|
447 |
+
- ld
|
448 |
+
- βfri
|
449 |
+
- βtwo
|
450 |
+
- general_dontcare
|
451 |
+
- βnews
|
452 |
+
- ol
|
453 |
+
- oo
|
454 |
+
- ant
|
455 |
+
- βfive
|
456 |
+
- βevent
|
457 |
+
- ake
|
458 |
+
- definition_word
|
459 |
+
- transport_type
|
460 |
+
- βyour
|
461 |
+
- vi
|
462 |
+
- orn
|
463 |
+
- op
|
464 |
+
- βweather
|
465 |
+
- ome
|
466 |
+
- βapp
|
467 |
+
- βlo
|
468 |
+
- de
|
469 |
+
- βmusic
|
470 |
+
- weather_descriptor
|
471 |
+
- ak
|
472 |
+
- ke
|
473 |
+
- βthere
|
474 |
+
- βsi
|
475 |
+
- βlights
|
476 |
+
- βnow
|
477 |
+
- βmo
|
478 |
+
- calendar_remove
|
479 |
+
- our
|
480 |
+
- βdollar
|
481 |
+
- food_type
|
482 |
+
- me
|
483 |
+
- βmore
|
484 |
+
- βno
|
485 |
+
- βbirthday
|
486 |
+
- orrect
|
487 |
+
- βrep
|
488 |
+
- βshow
|
489 |
+
- play_radio
|
490 |
+
- βmon
|
491 |
+
- βdoes
|
492 |
+
- ood
|
493 |
+
- ag
|
494 |
+
- li
|
495 |
+
- βsto
|
496 |
+
- βcontact
|
497 |
+
- cket
|
498 |
+
- email_querycontact
|
499 |
+
- βev
|
500 |
+
- βcould
|
501 |
+
- ange
|
502 |
+
- βjust
|
503 |
+
- out
|
504 |
+
- ame
|
505 |
+
- .
|
506 |
+
- βja
|
507 |
+
- βconfirm
|
508 |
+
- qa_currency
|
509 |
+
- βman
|
510 |
+
- βlate
|
511 |
+
- βthink
|
512 |
+
- βsome
|
513 |
+
- timeofday
|
514 |
+
- βbo
|
515 |
+
- qa_stock
|
516 |
+
- ong
|
517 |
+
- βstart
|
518 |
+
- βwork
|
519 |
+
- βten
|
520 |
+
- int
|
521 |
+
- βcommand
|
522 |
+
- all
|
523 |
+
- βmake
|
524 |
+
- βla
|
525 |
+
- j
|
526 |
+
- βansw
|
527 |
+
- βhour
|
528 |
+
- βcle
|
529 |
+
- ah
|
530 |
+
- βfind
|
531 |
+
- βservice
|
532 |
+
- βfa
|
533 |
+
- qu
|
534 |
+
- general_commandstop
|
535 |
+
- ai
|
536 |
+
- βwhen
|
537 |
+
- βte
|
538 |
+
- βby
|
539 |
+
- social_query
|
540 |
+
- ard
|
541 |
+
- βtw
|
542 |
+
- ul
|
543 |
+
- id
|
544 |
+
- βseven
|
545 |
+
- βwhere
|
546 |
+
- βmuch
|
547 |
+
- art
|
548 |
+
- βappointment
|
549 |
+
- ver
|
550 |
+
- artist_name
|
551 |
+
- el
|
552 |
+
- device_type
|
553 |
+
- βknow
|
554 |
+
- βthree
|
555 |
+
- βevents
|
556 |
+
- βtr
|
557 |
+
- βli
|
558 |
+
- ork
|
559 |
+
- red
|
560 |
+
- ect
|
561 |
+
- βlet
|
562 |
+
- βrespon
|
563 |
+
- βpar
|
564 |
+
- zz
|
565 |
+
- βgive
|
566 |
+
- βtwenty
|
567 |
+
- βti
|
568 |
+
- βcurre
|
569 |
+
- play_podcasts
|
570 |
+
- βradio
|
571 |
+
- cooking_recipe
|
572 |
+
- transport_query
|
573 |
+
- βcon
|
574 |
+
- gh
|
575 |
+
- βle
|
576 |
+
- lists_query
|
577 |
+
- βrem
|
578 |
+
- recommendation_events
|
579 |
+
- house_place
|
580 |
+
- alarm_set
|
581 |
+
- play_audiobook
|
582 |
+
- ist
|
583 |
+
- ase
|
584 |
+
- music_genre
|
585 |
+
- ive
|
586 |
+
- ast
|
587 |
+
- player_setting
|
588 |
+
- ort
|
589 |
+
- lly
|
590 |
+
- news_topic
|
591 |
+
- list_name
|
592 |
+
- βplaylist
|
593 |
+
- βne
|
594 |
+
- business_type
|
595 |
+
- personal_info
|
596 |
+
- ind
|
597 |
+
- ust
|
598 |
+
- di
|
599 |
+
- ress
|
600 |
+
- recommendation_locations
|
601 |
+
- lists_createoradd
|
602 |
+
- iot_hue_lightoff
|
603 |
+
- lists_remove
|
604 |
+
- ord
|
605 |
+
- βlight
|
606 |
+
- ere
|
607 |
+
- alarm_query
|
608 |
+
- audio_volume_mute
|
609 |
+
- music_query
|
610 |
+
- βaudio
|
611 |
+
- rain
|
612 |
+
- βdate
|
613 |
+
- βorder
|
614 |
+
- audio_volume_up
|
615 |
+
- βar
|
616 |
+
- βpodcast
|
617 |
+
- transport_ticket
|
618 |
+
- mail
|
619 |
+
- iot_hue_lightchange
|
620 |
+
- iot_coffee
|
621 |
+
- radio_name
|
622 |
+
- ill
|
623 |
+
- βri
|
624 |
+
- '@'
|
625 |
+
- takeaway_query
|
626 |
+
- song_name
|
627 |
+
- takeaway_order
|
628 |
+
- βra
|
629 |
+
- email_addcontact
|
630 |
+
- play_game
|
631 |
+
- book
|
632 |
+
- transport_traffic
|
633 |
+
- βhouse
|
634 |
+
- music_likeness
|
635 |
+
- her
|
636 |
+
- transport_taxi
|
637 |
+
- iot_hue_lightdim
|
638 |
+
- ment
|
639 |
+
- ght
|
640 |
+
- fo
|
641 |
+
- order_type
|
642 |
+
- color_type
|
643 |
+
- '1'
|
644 |
+
- ven
|
645 |
+
- ould
|
646 |
+
- general_joke
|
647 |
+
- ess
|
648 |
+
- ain
|
649 |
+
- qa_maths
|
650 |
+
- βplace
|
651 |
+
- βtwe
|
652 |
+
- cast
|
653 |
+
- iot_cleaning
|
654 |
+
- βche
|
655 |
+
- βcont
|
656 |
+
- ith
|
657 |
+
- audiobook_name
|
658 |
+
- email_address
|
659 |
+
- game_name
|
660 |
+
- βcal
|
661 |
+
- general_frequency
|
662 |
+
- βtom
|
663 |
+
- βfood
|
664 |
+
- act
|
665 |
+
- iot_hue_lightup
|
666 |
+
- '2'
|
667 |
+
- alarm_remove
|
668 |
+
- podcast_descriptor
|
669 |
+
- βdefinition
|
670 |
+
- audio_volume_down
|
671 |
+
- βmedia
|
672 |
+
- email_folder
|
673 |
+
- dia
|
674 |
+
- meal_type
|
675 |
+
- βmus
|
676 |
+
- recommendation_movies
|
677 |
+
- βad
|
678 |
+
- ree
|
679 |
+
- pt
|
680 |
+
- now
|
681 |
+
- playlist_name
|
682 |
+
- βperson
|
683 |
+
- change_amount
|
684 |
+
- βpla
|
685 |
+
- escri
|
686 |
+
- datetime_convert
|
687 |
+
- podcast_name
|
688 |
+
- βab
|
689 |
+
- time_zone
|
690 |
+
- βdef
|
691 |
+
- ting
|
692 |
+
- iot_wemo_on
|
693 |
+
- music_settings
|
694 |
+
- iot_wemo_off
|
695 |
+
- orre
|
696 |
+
- cy
|
697 |
+
- ank
|
698 |
+
- music_descriptor
|
699 |
+
- lar
|
700 |
+
- app_name
|
701 |
+
- row
|
702 |
+
- joke_type
|
703 |
+
- xt
|
704 |
+
- of
|
705 |
+
- ition
|
706 |
+
- βmeet
|
707 |
+
- ink
|
708 |
+
- βconfir
|
709 |
+
- transport_agency
|
710 |
+
- general_greet
|
711 |
+
- βbusiness
|
712 |
+
- βart
|
713 |
+
- βag
|
714 |
+
- urn
|
715 |
+
- escript
|
716 |
+
- rom
|
717 |
+
- βrel
|
718 |
+
- βau
|
719 |
+
- βcurrency
|
720 |
+
- audio_volume_other
|
721 |
+
- iot_hue_lighton
|
722 |
+
- βartist
|
723 |
+
- '?'
|
724 |
+
- βbus
|
725 |
+
- cooking_type
|
726 |
+
- movie_name
|
727 |
+
- coffee_type
|
728 |
+
- ingredient
|
729 |
+
- ather
|
730 |
+
- music_dislikeness
|
731 |
+
- sp
|
732 |
+
- q
|
733 |
+
- βser
|
734 |
+
- esc
|
735 |
+
- βbir
|
736 |
+
- βcur
|
737 |
+
- name
|
738 |
+
- βtran
|
739 |
+
- βhou
|
740 |
+
- ek
|
741 |
+
- uch
|
742 |
+
- βconf
|
743 |
+
- βface
|
744 |
+
- '9'
|
745 |
+
- βbirth
|
746 |
+
- I
|
747 |
+
- sw
|
748 |
+
- transport_descriptor
|
749 |
+
- βcomm
|
750 |
+
- lease
|
751 |
+
- transport_name
|
752 |
+
- aid
|
753 |
+
- movie_type
|
754 |
+
- βdevice
|
755 |
+
- alarm_type
|
756 |
+
- audiobook_author
|
757 |
+
- '5'
|
758 |
+
- drink_type
|
759 |
+
- βjoh
|
760 |
+
- βdefin
|
761 |
+
- word
|
762 |
+
- βcurren
|
763 |
+
- order
|
764 |
+
- iness
|
765 |
+
- W
|
766 |
+
- cooking_query
|
767 |
+
- sport_type
|
768 |
+
- βrelation
|
769 |
+
- oint
|
770 |
+
- H
|
771 |
+
- '8'
|
772 |
+
- A
|
773 |
+
- '0'
|
774 |
+
- βdol
|
775 |
+
- vice
|
776 |
+
- βpers
|
777 |
+
- '&'
|
778 |
+
- T
|
779 |
+
- βappoint
|
780 |
+
- _
|
781 |
+
- '7'
|
782 |
+
- '3'
|
783 |
+
- '-'
|
784 |
+
- game_type
|
785 |
+
- βpod
|
786 |
+
- N
|
787 |
+
- M
|
788 |
+
- E
|
789 |
+
- list
|
790 |
+
- music_album
|
791 |
+
- dio
|
792 |
+
- βtransport
|
793 |
+
- qa_query
|
794 |
+
- C
|
795 |
+
- O
|
796 |
+
- U
|
797 |
+
- query_detail
|
798 |
+
- ']'
|
799 |
+
- '['
|
800 |
+
- descriptor
|
801 |
+
- ':'
|
802 |
+
- spon
|
803 |
+
- <sos/eos>
|
804 |
+
init: null
|
805 |
+
input_size: null
|
806 |
+
ctc_conf:
|
807 |
+
dropout_rate: 0.0
|
808 |
+
ctc_type: builtin
|
809 |
+
reduce: true
|
810 |
+
ignore_nan_grad: null
|
811 |
+
zero_infinity: true
|
812 |
+
joint_net_conf: null
|
813 |
+
use_preprocessor: true
|
814 |
+
token_type: word
|
815 |
+
bpemodel: null
|
816 |
+
non_linguistic_symbols: null
|
817 |
+
cleaner: null
|
818 |
+
g2p: null
|
819 |
+
speech_volume_normalize: null
|
820 |
+
rir_scp: null
|
821 |
+
rir_apply_prob: 1.0
|
822 |
+
noise_scp: null
|
823 |
+
noise_apply_prob: 1.0
|
824 |
+
noise_db_range: '13_15'
|
825 |
+
short_noise_thres: 0.5
|
826 |
+
aux_ctc_tasks: []
|
827 |
+
frontend: default
|
828 |
+
frontend_conf:
|
829 |
+
fs: 16k
|
830 |
+
specaug: specaug
|
831 |
+
specaug_conf:
|
832 |
+
apply_time_warp: true
|
833 |
+
time_warp_window: 5
|
834 |
+
time_warp_mode: bicubic
|
835 |
+
apply_freq_mask: true
|
836 |
+
freq_mask_width_range:
|
837 |
+
- 0
|
838 |
+
- 30
|
839 |
+
num_freq_mask: 2
|
840 |
+
apply_time_mask: true
|
841 |
+
time_mask_width_range:
|
842 |
+
- 0
|
843 |
+
- 40
|
844 |
+
num_time_mask: 2
|
845 |
+
normalize: utterance_mvn
|
846 |
+
normalize_conf: {}
|
847 |
+
model: espnet
|
848 |
+
model_conf:
|
849 |
+
ctc_weight: 0.3
|
850 |
+
lsm_weight: 0.1
|
851 |
+
length_normalized_loss: false
|
852 |
+
extract_feats_in_collect_stats: false
|
853 |
+
preencoder: null
|
854 |
+
preencoder_conf: {}
|
855 |
+
encoder: e_branchformer
|
856 |
+
encoder_conf:
|
857 |
+
output_size: 512
|
858 |
+
attention_heads: 8
|
859 |
+
attention_layer_type: rel_selfattn
|
860 |
+
pos_enc_layer_type: rel_pos
|
861 |
+
rel_pos_type: latest
|
862 |
+
cgmlp_linear_units: 3072
|
863 |
+
cgmlp_conv_kernel: 31
|
864 |
+
use_linear_after_conv: false
|
865 |
+
gate_activation: identity
|
866 |
+
num_blocks: 12
|
867 |
+
dropout_rate: 0.1
|
868 |
+
positional_dropout_rate: 0.1
|
869 |
+
attention_dropout_rate: 0.1
|
870 |
+
input_layer: conv2d
|
871 |
+
layer_drop_rate: 0.1
|
872 |
+
linear_units: 1024
|
873 |
+
positionwise_layer_type: linear
|
874 |
+
macaron_ffn: true
|
875 |
+
use_ffn: true
|
876 |
+
merge_conv_kernel: 31
|
877 |
+
postencoder: null
|
878 |
+
postencoder_conf: {}
|
879 |
+
decoder: transformer
|
880 |
+
decoder_conf:
|
881 |
+
attention_heads: 8
|
882 |
+
linear_units: 2048
|
883 |
+
num_blocks: 6
|
884 |
+
dropout_rate: 0.1
|
885 |
+
positional_dropout_rate: 0.1
|
886 |
+
self_attention_dropout_rate: 0.1
|
887 |
+
src_attention_dropout_rate: 0.1
|
888 |
+
layer_drop_rate: 0.2
|
889 |
+
preprocessor: default
|
890 |
+
preprocessor_conf: {}
|
891 |
+
required:
|
892 |
+
- output_dir
|
893 |
+
- token_list
|
894 |
+
version: '202301'
|
895 |
+
distributed: false
|
896 |
+
```
|
897 |
+
|
898 |
+
</details>
|
899 |
+
|
900 |
+
|
901 |
+
|
902 |
+
### Citing ESPnet
|
903 |
+
|
904 |
+
```BibTex
|
905 |
+
@inproceedings{watanabe2018espnet,
|
906 |
+
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
|
907 |
+
title={{ESPnet}: End-to-End Speech Processing Toolkit},
|
908 |
+
year={2018},
|
909 |
+
booktitle={Proceedings of Interspeech},
|
910 |
+
pages={2207--2211},
|
911 |
+
doi={10.21437/Interspeech.2018-1456},
|
912 |
+
url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
|
913 |
+
}
|
914 |
+
|
915 |
+
|
916 |
+
|
917 |
+
|
918 |
+
```
|
919 |
+
|
920 |
+
or arXiv:
|
921 |
+
|
922 |
+
```bibtex
|
923 |
+
@misc{watanabe2018espnet,
|
924 |
+
title={ESPnet: End-to-End Speech Processing Toolkit},
|
925 |
+
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
|
926 |
+
year={2018},
|
927 |
+
eprint={1804.00015},
|
928 |
+
archivePrefix={arXiv},
|
929 |
+
primaryClass={cs.CL}
|
930 |
+
}
|
931 |
+
```
|
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/RESULTS.md
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!-- Generated by scripts/utils/show_asr_result.sh -->
|
2 |
+
# RESULTS
|
3 |
+
## Environments
|
4 |
+
- date: `Mon Feb 27 19:14:30 CST 2023`
|
5 |
+
- python version: `3.9.15 (main, Nov 24 2022, 14:31:59) [GCC 11.2.0]`
|
6 |
+
- espnet version: `espnet 202301`
|
7 |
+
- pytorch version: `pytorch 1.13.1`
|
8 |
+
- Git hash: `4bbd29a40cc7e2259996d30c0c76d3d789c1153d`
|
9 |
+
- Commit date: `Sat Feb 25 21:54:03 2023 -0600`
|
10 |
+
|
11 |
+
## exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
|
12 |
+
### WER
|
13 |
+
|
14 |
+
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|
15 |
+
|---|---|---|---|---|---|---|---|---|
|
16 |
+
|decode_asr_asr_model_valid.acc.ave_10best/devel|8690|178058|84.6|7.6|7.8|3.2|18.6|51.2|
|
17 |
+
|decode_asr_asr_model_valid.acc.ave_10best/test|13078|262176|83.7|7.7|8.6|3.0|19.3|49.7|
|
18 |
+
|
19 |
+
### CER
|
20 |
+
|
21 |
+
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|
22 |
+
|---|---|---|---|---|---|---|---|---|
|
23 |
+
|decode_asr_asr_model_valid.acc.ave_10best/devel|8690|847400|90.8|3.0|6.2|3.5|12.7|51.2|
|
24 |
+
|decode_asr_asr_model_valid.acc.ave_10best/test|13078|1245475|89.7|3.1|7.2|3.4|13.6|49.7|
|
25 |
+
|
26 |
+
### TER
|
27 |
+
|
28 |
+
|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|
29 |
+
|---|---|---|---|---|---|---|---|---|
|
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/config.yaml
ADDED
@@ -0,0 +1,815 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
config: conf/tuning/train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop.yaml
|
2 |
+
print_config: false
|
3 |
+
log_level: INFO
|
4 |
+
dry_run: false
|
5 |
+
iterator_type: sequence
|
6 |
+
output_dir: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word
|
7 |
+
ngpu: 1
|
8 |
+
seed: 0
|
9 |
+
num_workers: 1
|
10 |
+
num_att_plot: 3
|
11 |
+
dist_backend: nccl
|
12 |
+
dist_init_method: env://
|
13 |
+
dist_world_size: null
|
14 |
+
dist_rank: null
|
15 |
+
local_rank: 0
|
16 |
+
dist_master_addr: null
|
17 |
+
dist_master_port: null
|
18 |
+
dist_launcher: null
|
19 |
+
multiprocessing_distributed: false
|
20 |
+
unused_parameters: false
|
21 |
+
sharded_ddp: false
|
22 |
+
cudnn_enabled: true
|
23 |
+
cudnn_benchmark: false
|
24 |
+
cudnn_deterministic: true
|
25 |
+
collect_stats: false
|
26 |
+
write_collected_feats: false
|
27 |
+
max_epoch: 60
|
28 |
+
patience: null
|
29 |
+
val_scheduler_criterion:
|
30 |
+
- valid
|
31 |
+
- loss
|
32 |
+
early_stopping_criterion:
|
33 |
+
- valid
|
34 |
+
- loss
|
35 |
+
- min
|
36 |
+
best_model_criterion:
|
37 |
+
- - valid
|
38 |
+
- acc
|
39 |
+
- max
|
40 |
+
keep_nbest_models: 10
|
41 |
+
nbest_averaging_interval: 0
|
42 |
+
grad_clip: 5.0
|
43 |
+
grad_clip_type: 2.0
|
44 |
+
grad_noise: false
|
45 |
+
accum_grad: 1
|
46 |
+
no_forward_run: false
|
47 |
+
resume: true
|
48 |
+
train_dtype: float32
|
49 |
+
use_amp: false
|
50 |
+
log_interval: null
|
51 |
+
use_matplotlib: true
|
52 |
+
use_tensorboard: true
|
53 |
+
create_graph_in_tensorboard: false
|
54 |
+
use_wandb: false
|
55 |
+
wandb_project: null
|
56 |
+
wandb_id: null
|
57 |
+
wandb_entity: null
|
58 |
+
wandb_name: null
|
59 |
+
wandb_model_log_interval: -1
|
60 |
+
detect_anomaly: false
|
61 |
+
pretrain_path: null
|
62 |
+
init_param: []
|
63 |
+
ignore_init_mismatch: false
|
64 |
+
freeze_param: []
|
65 |
+
num_iters_per_epoch: null
|
66 |
+
batch_size: 64
|
67 |
+
valid_batch_size: null
|
68 |
+
batch_bins: 1000000
|
69 |
+
valid_batch_bins: null
|
70 |
+
train_shape_file:
|
71 |
+
- exp/asr_stats_raw_en_word/train/speech_shape
|
72 |
+
- exp/asr_stats_raw_en_word/train/text_shape.word
|
73 |
+
valid_shape_file:
|
74 |
+
- exp/asr_stats_raw_en_word/valid/speech_shape
|
75 |
+
- exp/asr_stats_raw_en_word/valid/text_shape.word
|
76 |
+
batch_type: folded
|
77 |
+
valid_batch_type: null
|
78 |
+
fold_length:
|
79 |
+
- 80000
|
80 |
+
- 150
|
81 |
+
sort_in_batch: descending
|
82 |
+
sort_batch: descending
|
83 |
+
multiple_iterator: false
|
84 |
+
chunk_length: 500
|
85 |
+
chunk_shift_ratio: 0.5
|
86 |
+
num_cache_chunks: 1024
|
87 |
+
train_data_path_and_name_and_type:
|
88 |
+
- - dump/raw/train/wav.scp
|
89 |
+
- speech
|
90 |
+
- kaldi_ark
|
91 |
+
- - dump/raw/train/text
|
92 |
+
- text
|
93 |
+
- text
|
94 |
+
valid_data_path_and_name_and_type:
|
95 |
+
- - dump/raw/devel/wav.scp
|
96 |
+
- speech
|
97 |
+
- kaldi_ark
|
98 |
+
- - dump/raw/devel/text
|
99 |
+
- text
|
100 |
+
- text
|
101 |
+
allow_variable_data_keys: false
|
102 |
+
max_cache_size: 0.0
|
103 |
+
max_cache_fd: 32
|
104 |
+
valid_max_cache_size: null
|
105 |
+
exclude_weight_decay: false
|
106 |
+
exclude_weight_decay_conf: {}
|
107 |
+
optim: adam
|
108 |
+
optim_conf:
|
109 |
+
lr: 0.001
|
110 |
+
weight_decay: 1.0e-06
|
111 |
+
scheduler: warmuplr
|
112 |
+
scheduler_conf:
|
113 |
+
warmup_steps: 35000
|
114 |
+
token_list:
|
115 |
+
- <blank>
|
116 |
+
- <unk>
|
117 |
+
- βSEP
|
118 |
+
- βFILL
|
119 |
+
- s
|
120 |
+
- βthe
|
121 |
+
- a
|
122 |
+
- βto
|
123 |
+
- βi
|
124 |
+
- βme
|
125 |
+
- e
|
126 |
+
- βs
|
127 |
+
- βa
|
128 |
+
- i
|
129 |
+
- βyou
|
130 |
+
- βwhat
|
131 |
+
- er
|
132 |
+
- ing
|
133 |
+
- u
|
134 |
+
- βis
|
135 |
+
- ''''
|
136 |
+
- o
|
137 |
+
- p
|
138 |
+
- βin
|
139 |
+
- βp
|
140 |
+
- y
|
141 |
+
- βmy
|
142 |
+
- βplease
|
143 |
+
- d
|
144 |
+
- c
|
145 |
+
- m
|
146 |
+
- βb
|
147 |
+
- l
|
148 |
+
- βm
|
149 |
+
- βc
|
150 |
+
- st
|
151 |
+
- date
|
152 |
+
- n
|
153 |
+
- βd
|
154 |
+
- le
|
155 |
+
- b
|
156 |
+
- βfor
|
157 |
+
- re
|
158 |
+
- t
|
159 |
+
- βon
|
160 |
+
- en
|
161 |
+
- h
|
162 |
+
- 'on'
|
163 |
+
- ar
|
164 |
+
- person
|
165 |
+
- βre
|
166 |
+
- βf
|
167 |
+
- βg
|
168 |
+
- βof
|
169 |
+
- an
|
170 |
+
- β
|
171 |
+
- g
|
172 |
+
- βtoday
|
173 |
+
- βt
|
174 |
+
- or
|
175 |
+
- βit
|
176 |
+
- βthis
|
177 |
+
- βh
|
178 |
+
- r
|
179 |
+
- f
|
180 |
+
- at
|
181 |
+
- ch
|
182 |
+
- ce
|
183 |
+
- place_name
|
184 |
+
- βemail
|
185 |
+
- βdo
|
186 |
+
- es
|
187 |
+
- ri
|
188 |
+
- βe
|
189 |
+
- βw
|
190 |
+
- ic
|
191 |
+
- in
|
192 |
+
- βthat
|
193 |
+
- event_name
|
194 |
+
- βplay
|
195 |
+
- βand
|
196 |
+
- al
|
197 |
+
- βn
|
198 |
+
- βcan
|
199 |
+
- email_query
|
200 |
+
- ve
|
201 |
+
- βnew
|
202 |
+
- day
|
203 |
+
- it
|
204 |
+
- ate
|
205 |
+
- βfrom
|
206 |
+
- βhave
|
207 |
+
- k
|
208 |
+
- time
|
209 |
+
- βam
|
210 |
+
- media_type
|
211 |
+
- email_sendemail
|
212 |
+
- ent
|
213 |
+
- βolly
|
214 |
+
- qa_factoid
|
215 |
+
- se
|
216 |
+
- v
|
217 |
+
- et
|
218 |
+
- ck
|
219 |
+
- βany
|
220 |
+
- calendar_set
|
221 |
+
- ly
|
222 |
+
- th
|
223 |
+
- βhow
|
224 |
+
- βmeeting
|
225 |
+
- ed
|
226 |
+
- βtell
|
227 |
+
- βst
|
228 |
+
- x
|
229 |
+
- ur
|
230 |
+
- ro
|
231 |
+
- βat
|
232 |
+
- nd
|
233 |
+
- βlist
|
234 |
+
- w
|
235 |
+
- βu
|
236 |
+
- ou
|
237 |
+
- βnot
|
238 |
+
- βabout
|
239 |
+
- βan
|
240 |
+
- βo
|
241 |
+
- general_negate
|
242 |
+
- ut
|
243 |
+
- βtime
|
244 |
+
- βbe
|
245 |
+
- βch
|
246 |
+
- βare
|
247 |
+
- social_post
|
248 |
+
- business_name
|
249 |
+
- la
|
250 |
+
- ty
|
251 |
+
- play_music
|
252 |
+
- ot
|
253 |
+
- general_quirky
|
254 |
+
- βl
|
255 |
+
- βsh
|
256 |
+
- βtweet
|
257 |
+
- om
|
258 |
+
- βweek
|
259 |
+
- um
|
260 |
+
- βone
|
261 |
+
- ter
|
262 |
+
- βhe
|
263 |
+
- βup
|
264 |
+
- βcom
|
265 |
+
- general_praise
|
266 |
+
- weather_query
|
267 |
+
- βnext
|
268 |
+
- βth
|
269 |
+
- βcheck
|
270 |
+
- calendar_query
|
271 |
+
- βlast
|
272 |
+
- βro
|
273 |
+
- ad
|
274 |
+
- is
|
275 |
+
- βwith
|
276 |
+
- ay
|
277 |
+
- βsend
|
278 |
+
- pe
|
279 |
+
- βpm
|
280 |
+
- βtomorrow
|
281 |
+
- βj
|
282 |
+
- un
|
283 |
+
- βtrain
|
284 |
+
- general_explain
|
285 |
+
- βv
|
286 |
+
- one
|
287 |
+
- βr
|
288 |
+
- ra
|
289 |
+
- news_query
|
290 |
+
- ation
|
291 |
+
- βemails
|
292 |
+
- us
|
293 |
+
- if
|
294 |
+
- ct
|
295 |
+
- βco
|
296 |
+
- βadd
|
297 |
+
- βwill
|
298 |
+
- βse
|
299 |
+
- nt
|
300 |
+
- βwas
|
301 |
+
- ine
|
302 |
+
- βde
|
303 |
+
- βset
|
304 |
+
- βex
|
305 |
+
- βwould
|
306 |
+
- ir
|
307 |
+
- ow
|
308 |
+
- ber
|
309 |
+
- general_repeat
|
310 |
+
- ight
|
311 |
+
- ook
|
312 |
+
- βagain
|
313 |
+
- βsong
|
314 |
+
- currency_name
|
315 |
+
- ll
|
316 |
+
- βha
|
317 |
+
- βgo
|
318 |
+
- relation
|
319 |
+
- te
|
320 |
+
- ion
|
321 |
+
- and
|
322 |
+
- βy
|
323 |
+
- βye
|
324 |
+
- general_affirm
|
325 |
+
- general_confirm
|
326 |
+
- ery
|
327 |
+
- βpo
|
328 |
+
- ff
|
329 |
+
- βwe
|
330 |
+
- βturn
|
331 |
+
- βdid
|
332 |
+
- βmar
|
333 |
+
- βalarm
|
334 |
+
- βlike
|
335 |
+
- datetime_query
|
336 |
+
- ers
|
337 |
+
- βall
|
338 |
+
- βremind
|
339 |
+
- βso
|
340 |
+
- qa_definition
|
341 |
+
- βcalendar
|
342 |
+
- end
|
343 |
+
- βsaid
|
344 |
+
- ci
|
345 |
+
- βoff
|
346 |
+
- βjohn
|
347 |
+
- βday
|
348 |
+
- ss
|
349 |
+
- pla
|
350 |
+
- ume
|
351 |
+
- βget
|
352 |
+
- ail
|
353 |
+
- pp
|
354 |
+
- z
|
355 |
+
- ry
|
356 |
+
- am
|
357 |
+
- βneed
|
358 |
+
- as
|
359 |
+
- βthank
|
360 |
+
- βwh
|
361 |
+
- βwant
|
362 |
+
- βright
|
363 |
+
- βjo
|
364 |
+
- βfacebook
|
365 |
+
- βk
|
366 |
+
- ge
|
367 |
+
- ld
|
368 |
+
- βfri
|
369 |
+
- βtwo
|
370 |
+
- general_dontcare
|
371 |
+
- βnews
|
372 |
+
- ol
|
373 |
+
- oo
|
374 |
+
- ant
|
375 |
+
- βfive
|
376 |
+
- βevent
|
377 |
+
- ake
|
378 |
+
- definition_word
|
379 |
+
- transport_type
|
380 |
+
- βyour
|
381 |
+
- vi
|
382 |
+
- orn
|
383 |
+
- op
|
384 |
+
- βweather
|
385 |
+
- ome
|
386 |
+
- βapp
|
387 |
+
- βlo
|
388 |
+
- de
|
389 |
+
- βmusic
|
390 |
+
- weather_descriptor
|
391 |
+
- ak
|
392 |
+
- ke
|
393 |
+
- βthere
|
394 |
+
- βsi
|
395 |
+
- βlights
|
396 |
+
- βnow
|
397 |
+
- βmo
|
398 |
+
- calendar_remove
|
399 |
+
- our
|
400 |
+
- βdollar
|
401 |
+
- food_type
|
402 |
+
- me
|
403 |
+
- βmore
|
404 |
+
- βno
|
405 |
+
- βbirthday
|
406 |
+
- orrect
|
407 |
+
- βrep
|
408 |
+
- βshow
|
409 |
+
- play_radio
|
410 |
+
- βmon
|
411 |
+
- βdoes
|
412 |
+
- ood
|
413 |
+
- ag
|
414 |
+
- li
|
415 |
+
- βsto
|
416 |
+
- βcontact
|
417 |
+
- cket
|
418 |
+
- email_querycontact
|
419 |
+
- βev
|
420 |
+
- βcould
|
421 |
+
- ange
|
422 |
+
- βjust
|
423 |
+
- out
|
424 |
+
- ame
|
425 |
+
- .
|
426 |
+
- βja
|
427 |
+
- βconfirm
|
428 |
+
- qa_currency
|
429 |
+
- βman
|
430 |
+
- βlate
|
431 |
+
- βthink
|
432 |
+
- βsome
|
433 |
+
- timeofday
|
434 |
+
- βbo
|
435 |
+
- qa_stock
|
436 |
+
- ong
|
437 |
+
- βstart
|
438 |
+
- βwork
|
439 |
+
- βten
|
440 |
+
- int
|
441 |
+
- βcommand
|
442 |
+
- all
|
443 |
+
- βmake
|
444 |
+
- βla
|
445 |
+
- j
|
446 |
+
- βansw
|
447 |
+
- βhour
|
448 |
+
- βcle
|
449 |
+
- ah
|
450 |
+
- βfind
|
451 |
+
- βservice
|
452 |
+
- βfa
|
453 |
+
- qu
|
454 |
+
- general_commandstop
|
455 |
+
- ai
|
456 |
+
- βwhen
|
457 |
+
- βte
|
458 |
+
- βby
|
459 |
+
- social_query
|
460 |
+
- ard
|
461 |
+
- βtw
|
462 |
+
- ul
|
463 |
+
- id
|
464 |
+
- βseven
|
465 |
+
- βwhere
|
466 |
+
- βmuch
|
467 |
+
- art
|
468 |
+
- βappointment
|
469 |
+
- ver
|
470 |
+
- artist_name
|
471 |
+
- el
|
472 |
+
- device_type
|
473 |
+
- βknow
|
474 |
+
- βthree
|
475 |
+
- βevents
|
476 |
+
- βtr
|
477 |
+
- βli
|
478 |
+
- ork
|
479 |
+
- red
|
480 |
+
- ect
|
481 |
+
- βlet
|
482 |
+
- βrespon
|
483 |
+
- βpar
|
484 |
+
- zz
|
485 |
+
- βgive
|
486 |
+
- βtwenty
|
487 |
+
- βti
|
488 |
+
- βcurre
|
489 |
+
- play_podcasts
|
490 |
+
- βradio
|
491 |
+
- cooking_recipe
|
492 |
+
- transport_query
|
493 |
+
- βcon
|
494 |
+
- gh
|
495 |
+
- βle
|
496 |
+
- lists_query
|
497 |
+
- βrem
|
498 |
+
- recommendation_events
|
499 |
+
- house_place
|
500 |
+
- alarm_set
|
501 |
+
- play_audiobook
|
502 |
+
- ist
|
503 |
+
- ase
|
504 |
+
- music_genre
|
505 |
+
- ive
|
506 |
+
- ast
|
507 |
+
- player_setting
|
508 |
+
- ort
|
509 |
+
- lly
|
510 |
+
- news_topic
|
511 |
+
- list_name
|
512 |
+
- βplaylist
|
513 |
+
- βne
|
514 |
+
- business_type
|
515 |
+
- personal_info
|
516 |
+
- ind
|
517 |
+
- ust
|
518 |
+
- di
|
519 |
+
- ress
|
520 |
+
- recommendation_locations
|
521 |
+
- lists_createoradd
|
522 |
+
- iot_hue_lightoff
|
523 |
+
- lists_remove
|
524 |
+
- ord
|
525 |
+
- βlight
|
526 |
+
- ere
|
527 |
+
- alarm_query
|
528 |
+
- audio_volume_mute
|
529 |
+
- music_query
|
530 |
+
- βaudio
|
531 |
+
- rain
|
532 |
+
- βdate
|
533 |
+
- βorder
|
534 |
+
- audio_volume_up
|
535 |
+
- βar
|
536 |
+
- βpodcast
|
537 |
+
- transport_ticket
|
538 |
+
- mail
|
539 |
+
- iot_hue_lightchange
|
540 |
+
- iot_coffee
|
541 |
+
- radio_name
|
542 |
+
- ill
|
543 |
+
- βri
|
544 |
+
- '@'
|
545 |
+
- takeaway_query
|
546 |
+
- song_name
|
547 |
+
- takeaway_order
|
548 |
+
- βra
|
549 |
+
- email_addcontact
|
550 |
+
- play_game
|
551 |
+
- book
|
552 |
+
- transport_traffic
|
553 |
+
- βhouse
|
554 |
+
- music_likeness
|
555 |
+
- her
|
556 |
+
- transport_taxi
|
557 |
+
- iot_hue_lightdim
|
558 |
+
- ment
|
559 |
+
- ght
|
560 |
+
- fo
|
561 |
+
- order_type
|
562 |
+
- color_type
|
563 |
+
- '1'
|
564 |
+
- ven
|
565 |
+
- ould
|
566 |
+
- general_joke
|
567 |
+
- ess
|
568 |
+
- ain
|
569 |
+
- qa_maths
|
570 |
+
- βplace
|
571 |
+
- βtwe
|
572 |
+
- cast
|
573 |
+
- iot_cleaning
|
574 |
+
- βche
|
575 |
+
- βcont
|
576 |
+
- ith
|
577 |
+
- audiobook_name
|
578 |
+
- email_address
|
579 |
+
- game_name
|
580 |
+
- βcal
|
581 |
+
- general_frequency
|
582 |
+
- βtom
|
583 |
+
- βfood
|
584 |
+
- act
|
585 |
+
- iot_hue_lightup
|
586 |
+
- '2'
|
587 |
+
- alarm_remove
|
588 |
+
- podcast_descriptor
|
589 |
+
- βdefinition
|
590 |
+
- audio_volume_down
|
591 |
+
- βmedia
|
592 |
+
- email_folder
|
593 |
+
- dia
|
594 |
+
- meal_type
|
595 |
+
- βmus
|
596 |
+
- recommendation_movies
|
597 |
+
- βad
|
598 |
+
- ree
|
599 |
+
- pt
|
600 |
+
- now
|
601 |
+
- playlist_name
|
602 |
+
- βperson
|
603 |
+
- change_amount
|
604 |
+
- βpla
|
605 |
+
- escri
|
606 |
+
- datetime_convert
|
607 |
+
- podcast_name
|
608 |
+
- βab
|
609 |
+
- time_zone
|
610 |
+
- βdef
|
611 |
+
- ting
|
612 |
+
- iot_wemo_on
|
613 |
+
- music_settings
|
614 |
+
- iot_wemo_off
|
615 |
+
- orre
|
616 |
+
- cy
|
617 |
+
- ank
|
618 |
+
- music_descriptor
|
619 |
+
- lar
|
620 |
+
- app_name
|
621 |
+
- row
|
622 |
+
- joke_type
|
623 |
+
- xt
|
624 |
+
- of
|
625 |
+
- ition
|
626 |
+
- βmeet
|
627 |
+
- ink
|
628 |
+
- βconfir
|
629 |
+
- transport_agency
|
630 |
+
- general_greet
|
631 |
+
- βbusiness
|
632 |
+
- βart
|
633 |
+
- βag
|
634 |
+
- urn
|
635 |
+
- escript
|
636 |
+
- rom
|
637 |
+
- βrel
|
638 |
+
- βau
|
639 |
+
- βcurrency
|
640 |
+
- audio_volume_other
|
641 |
+
- iot_hue_lighton
|
642 |
+
- βartist
|
643 |
+
- '?'
|
644 |
+
- βbus
|
645 |
+
- cooking_type
|
646 |
+
- movie_name
|
647 |
+
- coffee_type
|
648 |
+
- ingredient
|
649 |
+
- ather
|
650 |
+
- music_dislikeness
|
651 |
+
- sp
|
652 |
+
- q
|
653 |
+
- βser
|
654 |
+
- esc
|
655 |
+
- βbir
|
656 |
+
- βcur
|
657 |
+
- name
|
658 |
+
- βtran
|
659 |
+
- βhou
|
660 |
+
- ek
|
661 |
+
- uch
|
662 |
+
- βconf
|
663 |
+
- βface
|
664 |
+
- '9'
|
665 |
+
- βbirth
|
666 |
+
- I
|
667 |
+
- sw
|
668 |
+
- transport_descriptor
|
669 |
+
- βcomm
|
670 |
+
- lease
|
671 |
+
- transport_name
|
672 |
+
- aid
|
673 |
+
- movie_type
|
674 |
+
- βdevice
|
675 |
+
- alarm_type
|
676 |
+
- audiobook_author
|
677 |
+
- '5'
|
678 |
+
- drink_type
|
679 |
+
- βjoh
|
680 |
+
- βdefin
|
681 |
+
- word
|
682 |
+
- βcurren
|
683 |
+
- order
|
684 |
+
- iness
|
685 |
+
- W
|
686 |
+
- cooking_query
|
687 |
+
- sport_type
|
688 |
+
- βrelation
|
689 |
+
- oint
|
690 |
+
- H
|
691 |
+
- '8'
|
692 |
+
- A
|
693 |
+
- '0'
|
694 |
+
- βdol
|
695 |
+
- vice
|
696 |
+
- βpers
|
697 |
+
- '&'
|
698 |
+
- T
|
699 |
+
- βappoint
|
700 |
+
- _
|
701 |
+
- '7'
|
702 |
+
- '3'
|
703 |
+
- '-'
|
704 |
+
- game_type
|
705 |
+
- βpod
|
706 |
+
- N
|
707 |
+
- M
|
708 |
+
- E
|
709 |
+
- list
|
710 |
+
- music_album
|
711 |
+
- dio
|
712 |
+
- βtransport
|
713 |
+
- qa_query
|
714 |
+
- C
|
715 |
+
- O
|
716 |
+
- U
|
717 |
+
- query_detail
|
718 |
+
- ']'
|
719 |
+
- '['
|
720 |
+
- descriptor
|
721 |
+
- ':'
|
722 |
+
- spon
|
723 |
+
- <sos/eos>
|
724 |
+
init: null
|
725 |
+
input_size: null
|
726 |
+
ctc_conf:
|
727 |
+
dropout_rate: 0.0
|
728 |
+
ctc_type: builtin
|
729 |
+
reduce: true
|
730 |
+
ignore_nan_grad: null
|
731 |
+
zero_infinity: true
|
732 |
+
joint_net_conf: null
|
733 |
+
use_preprocessor: true
|
734 |
+
token_type: word
|
735 |
+
bpemodel: null
|
736 |
+
non_linguistic_symbols: null
|
737 |
+
cleaner: null
|
738 |
+
g2p: null
|
739 |
+
speech_volume_normalize: null
|
740 |
+
rir_scp: null
|
741 |
+
rir_apply_prob: 1.0
|
742 |
+
noise_scp: null
|
743 |
+
noise_apply_prob: 1.0
|
744 |
+
noise_db_range: '13_15'
|
745 |
+
short_noise_thres: 0.5
|
746 |
+
aux_ctc_tasks: []
|
747 |
+
frontend: default
|
748 |
+
frontend_conf:
|
749 |
+
fs: 16k
|
750 |
+
specaug: specaug
|
751 |
+
specaug_conf:
|
752 |
+
apply_time_warp: true
|
753 |
+
time_warp_window: 5
|
754 |
+
time_warp_mode: bicubic
|
755 |
+
apply_freq_mask: true
|
756 |
+
freq_mask_width_range:
|
757 |
+
- 0
|
758 |
+
- 30
|
759 |
+
num_freq_mask: 2
|
760 |
+
apply_time_mask: true
|
761 |
+
time_mask_width_range:
|
762 |
+
- 0
|
763 |
+
- 40
|
764 |
+
num_time_mask: 2
|
765 |
+
normalize: utterance_mvn
|
766 |
+
normalize_conf: {}
|
767 |
+
model: espnet
|
768 |
+
model_conf:
|
769 |
+
ctc_weight: 0.3
|
770 |
+
lsm_weight: 0.1
|
771 |
+
length_normalized_loss: false
|
772 |
+
extract_feats_in_collect_stats: false
|
773 |
+
preencoder: null
|
774 |
+
preencoder_conf: {}
|
775 |
+
encoder: e_branchformer
|
776 |
+
encoder_conf:
|
777 |
+
output_size: 512
|
778 |
+
attention_heads: 8
|
779 |
+
attention_layer_type: rel_selfattn
|
780 |
+
pos_enc_layer_type: rel_pos
|
781 |
+
rel_pos_type: latest
|
782 |
+
cgmlp_linear_units: 3072
|
783 |
+
cgmlp_conv_kernel: 31
|
784 |
+
use_linear_after_conv: false
|
785 |
+
gate_activation: identity
|
786 |
+
num_blocks: 12
|
787 |
+
dropout_rate: 0.1
|
788 |
+
positional_dropout_rate: 0.1
|
789 |
+
attention_dropout_rate: 0.1
|
790 |
+
input_layer: conv2d
|
791 |
+
layer_drop_rate: 0.1
|
792 |
+
linear_units: 1024
|
793 |
+
positionwise_layer_type: linear
|
794 |
+
macaron_ffn: true
|
795 |
+
use_ffn: true
|
796 |
+
merge_conv_kernel: 31
|
797 |
+
postencoder: null
|
798 |
+
postencoder_conf: {}
|
799 |
+
decoder: transformer
|
800 |
+
decoder_conf:
|
801 |
+
attention_heads: 8
|
802 |
+
linear_units: 2048
|
803 |
+
num_blocks: 6
|
804 |
+
dropout_rate: 0.1
|
805 |
+
positional_dropout_rate: 0.1
|
806 |
+
self_attention_dropout_rate: 0.1
|
807 |
+
src_attention_dropout_rate: 0.1
|
808 |
+
layer_drop_rate: 0.2
|
809 |
+
preprocessor: default
|
810 |
+
preprocessor_conf: {}
|
811 |
+
required:
|
812 |
+
- output_dir
|
813 |
+
- token_list
|
814 |
+
version: '202301'
|
815 |
+
distributed: false
|
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/acc.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/backward_time.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/cer_ctc.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/forward_time.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/gpu_max_cached_mem_GB.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/iter_time.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_att.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/loss_ctc.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim0_lr0.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/optim_step_time.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/train_time.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/images/wer.png
ADDED
exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/valid.acc.ave_10best.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:138f1491cf079c779fe50fbbb016d2bded08ddc1e3d375075e99d24aa3bb6e31
|
3 |
+
size 441177571
|
meta.yaml
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
espnet: '202301'
|
2 |
+
files:
|
3 |
+
asr_model_file: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/valid.acc.ave_10best.pth
|
4 |
+
python: "3.9.15 (main, Nov 24 2022, 14:31:59) \n[GCC 11.2.0]"
|
5 |
+
timestamp: 1677546947.945574
|
6 |
+
torch: 1.13.1
|
7 |
+
yaml_files:
|
8 |
+
asr_train_config: exp/asr_train_asr_e_branchformer_e12_mlp3072_linear1024_layerdrop_raw_en_word/config.yaml
|
score.log
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Valid Intent Classification Result
|
2 |
+
0.8781357882623706
|
3 |
+
Test Intent Classification Result
|
4 |
+
0.8743691695977979
|
5 |
+
ββββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
6 |
+
β Scenario β Precision β Recall β F-Measure β
|
7 |
+
ββββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
8 |
+
β OVERALL β 0.9084 β 0.9084 β 0.9084 β
|
9 |
+
ββββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
10 |
+
|
11 |
+
ββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
12 |
+
β Action β Precision β Recall β F-Measure β
|
13 |
+
ββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
14 |
+
β OVERALL β 0.8852 β 0.8852 β 0.8852 β
|
15 |
+
ββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
16 |
+
|
17 |
+
βββββββββββββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
18 |
+
β Intent (scen_act) β Precision β Recall β F-Measure β
|
19 |
+
βββββββββββββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
20 |
+
β OVERALL β 0.8744 β 0.8744 β 0.8744 β
|
21 |
+
βββββββββββββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
22 |
+
|
23 |
+
ββββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
24 |
+
β Entities β Precision β Recall β F-Measure β
|
25 |
+
ββββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
26 |
+
β OVERALL β 0.7378 β 0.7015 β 0.7192 β
|
27 |
+
ββββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
28 |
+
|
29 |
+
ββββββββββββββββββββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
30 |
+
β Entities (distance word) β Precision β Recall β F-Measure β
|
31 |
+
ββββββββββββββββββββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
32 |
+
β OVERALL β 0.7760 β 0.7418 β 0.7585 β
|
33 |
+
ββββββββββββββββββββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
34 |
+
|
35 |
+
ββββββββββββββββββββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
36 |
+
β Entities (distance char) β Precision β Recall β F-Measure β
|
37 |
+
ββββββββββββββββββββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
38 |
+
β OVERALL β 0.8129 β 0.7754 β 0.7937 β
|
39 |
+
ββββββββββββββββββββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
40 |
+
|
41 |
+
ββββββββββββ€ββββββββββββββ€βββββββββββ€ββββββββββββββ
|
42 |
+
β Slu f1 β Precision β Recall β F-Measure β
|
43 |
+
ββββββββββββͺββββββββββββββͺβββββββββββͺββββββββββββββ‘
|
44 |
+
β OVERALL β 0.7940 β 0.7582 β 0.7757 β
|
45 |
+
ββββββββββββ§ββββββββββββββ§βββββββββββ§ββββββββββββββ
|
46 |
+
|