# Introduction ## How to clone this repo ``` sudo apt-get install git-lfs git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 git lfs pull ``` **Catuion**: You have to run `git lfs pull`. Otherwise, you will be SAD later. ----- ## Description This repo provides pre-trained conformer CTC model for the librispeech dataset using [icefall][icefall]. The commands for training are: ``` cd egs/librispeech/ASR/conformer_ctc ./prepare.sh export CUDA_VISIBLE_DEVICES="0,1,2,3" ./conformer_ctc/train.py \ --exp-dir conformer_ctc/exp_500_att0.8 \ --lang-dir data/lang_bpe_500 \ --att-rate 0.8 \ --full-libri 1 \ --max-duration 200 \ --concatenate-cuts 0 \ --world-size 4 \ --bucketing-sampler 1 \ --start-epoch 0 \ --num-epochs 90 ``` The command for decoding is: ``` ./conformer_ctc/decode.py \ --exp-dir conformer_ctc/exp_500_att0.8 \ --lang-dir data/lang_bpe_500 \ --max-duration 30 \ --concatenate-cuts 0 \ --bucketing-sampler 1 \ --num-paths 1000 \ --epoch 77 \ --avg 55 \ --method attention-decoder \ --nbest-scale 0.5 ``` You can find the decoding log for the above command in this repo: [log/log-decode-2021-11-09-17-38-28](log/log-decode-2021-11-09-17-38-28). The best WER for the librispeech test dataset is: | | test-clean | test-other | |-----|------------|------------| | WER | 2.42 | 5.73 | Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are: | ngram_lm_scale | attention_scale | |----------------|-----------------| | 2.0 | 2.0 | # File description - [log][log], this directory contains the decoding log - [test_wavs][test_wavs], this directory contains wave files for testing the pre-trained model - [data][data], this directory contains files generated by [prepare.sh][prepare] Note: For the `data/lm` directory, we provide only `G_4_gram.pt`. If you need other files in this directory, please run [prepare.sh][prepare]. - [exp][exp], this directory contains two files: `preprained.pt` and `cpu_jit.pt`. `exp/pretrained.pt` is generated by the following command: ``` ./conformer_ctc/export.py \ --epoch 77 \ --avg 55 \ --jit 0 \ --lang-dir data/lang_bpe_500 \ --exp-dir conformer_ctc/exp_500_att0.8 ``` **HINT**: To use `pre-trained.pt` to compute the WER for test-clean and test-other, just do the following: ``` cp icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt \ /path/to/icefall/egs/librispeech/ASR/conformer_ctc/exp/epoch-999.pt ``` and pass `--epoch 999 --avg 1` to `conformer_ctc/decode.py`. `exp/cpu_jit.pt` is generated by the following command: ``` ./conformer_ctc/export.py \ --epoch 77 \ --avg 55 \ --jit 1 \ --lang-dir data/lang_bpe_500 \ --exp-dir conformer_ctc/exp_500_att0.8 ``` # Deploy your model in C++ using k2 To deploy your model in C++ using k2 without depending on Python, do the following: ``` # Note: It requires torch >= 1.8.0 git clone https://github.com/k2-fsa/k2 cd k2 git checkout v2.0-pre mkdir build_release cd build_release cmake -DCMAKE_BUILD_TYPE=Release .. make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore ``` ## CTC decoding ``` cd k2/build_release ./bin/ctc_decode \ --use_gpu true \ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \ --bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav ``` ## HLG decoding ``` ./bin/hlg_decode \ --use_gpu true \ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \ --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \ --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav ``` ## HLG decoding + n-gram LM rescoring **NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM. V100 GPU with 32 GB RAM is known to work. ``` ./bin/ngram_lm_rescore \ --use_gpu true \ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \ --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \ --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \ --ngram_lm_scale 1.0 \ --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav ``` ## HLG decoding + n-gram LM rescoring + attention decoder rescoring **NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM. V100 GPU with 32 GB RAM is known to work. ``` ./bin/attention_rescore \ --use_gpu true \ --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \ --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \ --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \ --ngram_lm_scale 2.0 \ --attention_scale 2.0 \ --num_paths 100 \ --nbest_scale 0.5 \ --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \ --sos_id 1 \ --eos_id 1 \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \ ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav ``` [prepare]: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh [exp]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/exp [data]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/data [test_wavs]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/test_wavs [log]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/log [icefall]: https://github.com/k2-fsa/icefall