Introduction
How to clone this repo
sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull
Catuion: You have to run git lfs pull
. Otherwise, you will be SAD later.
Description
This repo provides pre-trained conformer CTC model for the librispeech dataset using icefall.
The commands for training are:
cd egs/librispeech/ASR/conformer_ctc
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./conformer_ctc/train.py \
--exp-dir conformer_ctc/exp_500_att0.8 \
--lang-dir data/lang_bpe_500 \
--att-rate 0.8 \
--full-libri 1 \
--max-duration 200 \
--concatenate-cuts 0 \
--world-size 4 \
--bucketing-sampler 1 \
--start-epoch 0 \
--num-epochs 90
The command for decoding is:
./conformer_ctc/decode.py \
--exp-dir conformer_ctc/exp_500_att0.8 \
--lang-dir data/lang_bpe_500 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 77 \
--avg 55 \
--method attention-decoder \
--nbest-scale 0.5
You can find the decoding log for the above command in this repo: log/log-decode-2021-11-09-17-38-28.
The best WER for the librispeech test dataset is:
test-clean | test-other | |
---|---|---|
WER | 2.42 | 5.73 |
Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
ngram_lm_scale | attention_scale |
---|---|
2.0 | 2.0 |
File description
- log, this directory contains the decoding log
- test_wavs, this directory contains wave files for testing the pre-trained model
- data, this directory contains files generated by prepare.sh
Note: For the data/lm
directory, we provide only G_4_gram.pt
. If you need other files
in this directory, please run prepare.sh.
- exp, this directory contains two files:
preprained.pt
andcpu_jit.pt
.
exp/pretrained.pt
is generated by the following command:
./conformer_ctc/export.py \
--epoch 77 \
--avg 55 \
--jit 0 \
--lang-dir data/lang_bpe_500 \
--exp-dir conformer_ctc/exp_500_att0.8
HINT: To use pre-trained.pt
to compute the WER for test-clean and test-other,
just do the following:
cp icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt \
/path/to/icefall/egs/librispeech/ASR/conformer_ctc/exp/epoch-999.pt
and pass --epoch 999 --avg 1
to conformer_ctc/decode.py
.
exp/cpu_jit.pt
is generated by the following command:
./conformer_ctc/export.py \
--epoch 77 \
--avg 55 \
--jit 1 \
--lang-dir data/lang_bpe_500 \
--exp-dir conformer_ctc/exp_500_att0.8
Deploy your model in C++ using k2
To deploy your model in C++ using k2 without depending on Python, do the following:
# Note: It requires torch >= 1.8.0
git clone https://github.com/k2-fsa/k2
cd k2
git checkout v2.0-pre
mkdir build_release
cd build_release
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
CTC decoding
cd k2/build_release
./bin/ctc_decode \
--use_gpu true \
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
--bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
HLG decoding
./bin/hlg_decode \
--use_gpu true \
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
HLG decoding + n-gram LM rescoring
NOTE: V100 GPU with 16 GB RAM is known NOT to work because of OOM. V100 GPU with 32 GB RAM is known to work.
./bin/ngram_lm_rescore \
--use_gpu true \
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
--g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
--ngram_lm_scale 1.0 \
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
HLG decoding + n-gram LM rescoring + attention decoder rescoring
NOTE: V100 GPU with 16 GB RAM is known NOT to work because of OOM. V100 GPU with 32 GB RAM is known to work.
./bin/attention_rescore \
--use_gpu true \
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
--hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
--g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
--ngram_lm_scale 2.0 \
--attention_scale 2.0 \
--num_paths 100 \
--nbest_scale 0.5 \
--word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
--sos_id 1 \
--eos_id 1 \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav