csukuangfj
/

icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09

Model card Files Files and versions Community

csukuangfj commited on Nov 9, 2021

Commit

fefae8c

•

1 Parent(s): 1bd2a85

Add model.

Browse files

Files changed (3) hide show

.gitattributes +3 -0
README.md +82 -6
exp/pretrained.pt +3 -0

.gitattributes CHANGED Viewed

@@ -25,3 +25,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zstandard filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zstandard filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+exp/cpu_jit.pt filter=lfs diff=lfs merge=lfs -text
+exp/pretrained.pt filter=lfs diff=lfs merge=lfs -text
+exp filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -51,7 +51,8 @@ The command for decoding is:
   --nbest-scale 0.5
 ```
-You can find the log in this repo: [log/log-decode-2021-11-09-17-38-28](log/log-decode-2021-11-09-17-38-28).
 The best WER for the librispeech test dataset is:
@@ -59,7 +60,7 @@ The best WER for the librispeech test dataset is:
 |-----|------------|------------|
 | WER | 2.42       | 5.73       |
-The best scale values are:
 | ngram_lm_scale | attention_scale |
 |----------------|-----------------|
@@ -68,13 +69,14 @@ The best scale values are:
 # File description
-- `log/`, this directory contains the decoding log
-- `data/`, this directory contains files generated by `./prepare.sh`
 Note: For the `data/lm` directory, we provide only `G_4_gram.pt`. If you need other files
 in this directory, please run `./prepare.sh`.
-- `exp`, this directory contains two files: `preprained.pt` and `cpu_jit.pt`.
 `exp/pretrained.pt` is generated by the following command:
 ```
@@ -86,6 +88,14 @@ in this directory, please run `./prepare.sh`.
   --exp-dir conformer_ctc/exp_500_att0.8
 ```
 `exp/cpu_jit.pt` is generated by the following command:
 ```
 ./conformer_ctc/export.py \
@@ -108,7 +118,73 @@ git checkout v2.0-pre
 mkdir build_release
 cd build_release
 cmake -DCMAKE_BUILD_TYPE=Release ..
-make -j ctc_decode ngram_lm_rescore attention_rescore
 ```
 [icefall]: https://github.com/k2-fsa/icefall

   --nbest-scale 0.5
 ```
+You can find the decoding log for the above command in this
+repo: [log/log-decode-2021-11-09-17-38-28](log/log-decode-2021-11-09-17-38-28).
 The best WER for the librispeech test dataset is:
 |-----|------------|------------|
 | WER | 2.42       | 5.73       |
+Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
 | ngram_lm_scale | attention_scale |
 |----------------|-----------------|
 # File description
+- [log/](log), this directory contains the decoding log
+- [test_wavs](test_wavs), this directory contains wave files for testing the pre-trained model
+- [data/](data), this directory contains files generated by `./prepare.sh`
 Note: For the `data/lm` directory, we provide only `G_4_gram.pt`. If you need other files
 in this directory, please run `./prepare.sh`.
+- [exp](exp), this directory contains two files: `preprained.pt` and `cpu_jit.pt`.
 `exp/pretrained.pt` is generated by the following command:
 ```
   --exp-dir conformer_ctc/exp_500_att0.8
 ```
+**HINT**: To use `pre-trained.pt` to compute the WER for test-clean and test-other,
+just do the following:
+```
+cp exp/pretrained
+cp icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt /path/to/icefall/egs/librispeech/ASR/conformer_ctc/exp/epoch-999.pt
+```
+and pass `--epoch 999 --avg 1` to `conformer_ctc/decode.py`.
 `exp/cpu_jit.pt` is generated by the following command:
 ```
 ./conformer_ctc/export.py \
 mkdir build_release
 cd build_release
 cmake -DCMAKE_BUILD_TYPE=Release ..
+make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
+```
+## CTC decoding
+```
+cd k2/build_release
+./bin/ctc_decode \
+  --use_gpu true \
+  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
+  --bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
+```
+## HLG decoding
+```
+./bin/hlg_decode \
+  --use_gpu true \
+  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
+  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
+  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
 ```
+## HLG decoding + n-gram LM rescoring
+**NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM.
+V100 GPU with 32 GB RAM is known to work.
+```
+./bin/ngram_lm_rescore \
+  --use_gpu true \
+  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
+  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
+  --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
+  --ngram_lm_scale 1.0 \
+  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
+```
+## HLG decoding + n-gram LM rescoring + attention decoder rescoring
+```
+./bin/attention_rescore \
+  --use_gpu true \
+  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
+  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
+  --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
+  --ngram_lm_scale 2.0 \
+  --attention_scale 2.0 \
+  --num_paths 100 \
+  --nbest_scale 0.5 \
+  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
+  --sos_id 1 \
+  --eos_id 1 \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
+  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
+```
+**NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM.
+V100 GPU with 32 GB RAM is known to work.
 [icefall]: https://github.com/k2-fsa/icefall

exp/pretrained.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11b6dd6bc02557030840d729923b9ae3e6db3ae665f048fb0056609c205a9ef9
+size 437166367