csukuangfj commited on
Commit
fefae8c
1 Parent(s): 1bd2a85

Add model.

Browse files
Files changed (3) hide show
  1. .gitattributes +3 -0
  2. README.md +82 -6
  3. exp/pretrained.pt +3 -0
.gitattributes CHANGED
@@ -25,3 +25,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
25
  *.zip filter=lfs diff=lfs merge=lfs -text
26
  *.zstandard filter=lfs diff=lfs merge=lfs -text
27
  *tfevents* filter=lfs diff=lfs merge=lfs -text
28
+ exp/cpu_jit.pt filter=lfs diff=lfs merge=lfs -text
29
+ exp/pretrained.pt filter=lfs diff=lfs merge=lfs -text
30
+ exp filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -51,7 +51,8 @@ The command for decoding is:
51
  --nbest-scale 0.5
52
  ```
53
 
54
- You can find the log in this repo: [log/log-decode-2021-11-09-17-38-28](log/log-decode-2021-11-09-17-38-28).
 
55
 
56
  The best WER for the librispeech test dataset is:
57
 
@@ -59,7 +60,7 @@ The best WER for the librispeech test dataset is:
59
  |-----|------------|------------|
60
  | WER | 2.42 | 5.73 |
61
 
62
- The best scale values are:
63
 
64
  | ngram_lm_scale | attention_scale |
65
  |----------------|-----------------|
@@ -68,13 +69,14 @@ The best scale values are:
68
 
69
  # File description
70
 
71
- - `log/`, this directory contains the decoding log
72
- - `data/`, this directory contains files generated by `./prepare.sh`
 
73
 
74
  Note: For the `data/lm` directory, we provide only `G_4_gram.pt`. If you need other files
75
  in this directory, please run `./prepare.sh`.
76
 
77
- - `exp`, this directory contains two files: `preprained.pt` and `cpu_jit.pt`.
78
 
79
  `exp/pretrained.pt` is generated by the following command:
80
  ```
@@ -86,6 +88,14 @@ in this directory, please run `./prepare.sh`.
86
  --exp-dir conformer_ctc/exp_500_att0.8
87
  ```
88
 
 
 
 
 
 
 
 
 
89
  `exp/cpu_jit.pt` is generated by the following command:
90
  ```
91
  ./conformer_ctc/export.py \
@@ -108,7 +118,73 @@ git checkout v2.0-pre
108
  mkdir build_release
109
  cd build_release
110
  cmake -DCMAKE_BUILD_TYPE=Release ..
111
- make -j ctc_decode ngram_lm_rescore attention_rescore
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ```
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  [icefall]: https://github.com/k2-fsa/icefall
 
51
  --nbest-scale 0.5
52
  ```
53
 
54
+ You can find the decoding log for the above command in this
55
+ repo: [log/log-decode-2021-11-09-17-38-28](log/log-decode-2021-11-09-17-38-28).
56
 
57
  The best WER for the librispeech test dataset is:
58
 
 
60
  |-----|------------|------------|
61
  | WER | 2.42 | 5.73 |
62
 
63
+ Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
64
 
65
  | ngram_lm_scale | attention_scale |
66
  |----------------|-----------------|
 
69
 
70
  # File description
71
 
72
+ - [log/](log), this directory contains the decoding log
73
+ - [test_wavs](test_wavs), this directory contains wave files for testing the pre-trained model
74
+ - [data/](data), this directory contains files generated by `./prepare.sh`
75
 
76
  Note: For the `data/lm` directory, we provide only `G_4_gram.pt`. If you need other files
77
  in this directory, please run `./prepare.sh`.
78
 
79
+ - [exp](exp), this directory contains two files: `preprained.pt` and `cpu_jit.pt`.
80
 
81
  `exp/pretrained.pt` is generated by the following command:
82
  ```
 
88
  --exp-dir conformer_ctc/exp_500_att0.8
89
  ```
90
 
91
+ **HINT**: To use `pre-trained.pt` to compute the WER for test-clean and test-other,
92
+ just do the following:
93
+ ```
94
+ cp exp/pretrained
95
+ cp icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt /path/to/icefall/egs/librispeech/ASR/conformer_ctc/exp/epoch-999.pt
96
+ ```
97
+ and pass `--epoch 999 --avg 1` to `conformer_ctc/decode.py`.
98
+
99
  `exp/cpu_jit.pt` is generated by the following command:
100
  ```
101
  ./conformer_ctc/export.py \
 
118
  mkdir build_release
119
  cd build_release
120
  cmake -DCMAKE_BUILD_TYPE=Release ..
121
+ make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
122
+ ```
123
+
124
+ ## CTC decoding
125
+ ```
126
+ cd k2/build_release
127
+ ./bin/ctc_decode \
128
+ --use_gpu true \
129
+ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
130
+ --bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model \
131
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
132
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
133
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
134
+ ```
135
+
136
+ ## HLG decoding
137
+
138
+ ```
139
+ ./bin/hlg_decode \
140
+ --use_gpu true \
141
+ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
142
+ --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
143
+ --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
144
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
145
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
146
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
147
  ```
148
 
149
+ ## HLG decoding + n-gram LM rescoring
150
+
151
+ **NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM.
152
+ V100 GPU with 32 GB RAM is known to work.
153
+
154
+ ```
155
+ ./bin/ngram_lm_rescore \
156
+ --use_gpu true \
157
+ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
158
+ --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
159
+ --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
160
+ --ngram_lm_scale 1.0 \
161
+ --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
162
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
163
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
164
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
165
+ ```
166
+
167
+ ## HLG decoding + n-gram LM rescoring + attention decoder rescoring
168
+
169
+ ```
170
+ ./bin/attention_rescore \
171
+ --use_gpu true \
172
+ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
173
+ --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
174
+ --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
175
+ --ngram_lm_scale 2.0 \
176
+ --attention_scale 2.0 \
177
+ --num_paths 100 \
178
+ --nbest_scale 0.5 \
179
+ --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
180
+ --sos_id 1 \
181
+ --eos_id 1 \
182
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
183
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
184
+ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
185
+ ```
186
+
187
+ **NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM.
188
+ V100 GPU with 32 GB RAM is known to work.
189
+
190
  [icefall]: https://github.com/k2-fsa/icefall
exp/pretrained.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11b6dd6bc02557030840d729923b9ae3e6db3ae665f048fb0056609c205a9ef9
3
+ size 437166367