Introduction

How to clone this repo

sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull

Catuion: You have to run git lfs pull. Otherwise, you will be SAD later.

Description

This repo provides pre-trained conformer CTC model for the librispeech dataset using icefall.

The commands for training are:

cd egs/librispeech/ASR/conformer_ctc
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./conformer_ctc/train.py \
  --exp-dir conformer_ctc/exp_500_att0.8 \
  --lang-dir data/lang_bpe_500 \
  --att-rate 0.8 \
  --full-libri 1 \
  --max-duration 200 \
  --concatenate-cuts 0 \
  --world-size 4 \
  --bucketing-sampler 1 \
  --start-epoch 0 \
  --num-epochs 90

The command for decoding is:

./conformer_ctc/decode.py \
  --exp-dir conformer_ctc/exp_500_att0.8 \
  --lang-dir data/lang_bpe_500 \
  --max-duration 30 \
  --concatenate-cuts 0 \
  --bucketing-sampler 1 \
  --num-paths 1000 \
  --epoch 77 \
  --avg 55 \
  --method attention-decoder \
  --nbest-scale 0.5

You can find the decoding log for the above command in this repo: log/log-decode-2021-11-09-17-38-28.

The best WER for the librispeech test dataset is:

	test-clean	test-other
WER	2.42	5.73

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

ngram_lm_scale	attention_scale
2.0	2.0

File description

log, this directory contains the decoding log
test_wavs, this directory contains wave files for testing the pre-trained model
data, this directory contains files generated by prepare.sh

Note: For the data/lm directory, we provide only G_4_gram.pt. If you need other files in this directory, please run prepare.sh.

exp, this directory contains two files: preprained.pt and cpu_jit.pt.

exp/pretrained.pt is generated by the following command:

./conformer_ctc/export.py \
  --epoch 77 \
  --avg 55 \
  --jit 0 \ 
  --lang-dir data/lang_bpe_500 \
  --exp-dir conformer_ctc/exp_500_att0.8

HINT: To use pre-trained.pt to compute the WER for test-clean and test-other, just do the following:

cp icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt \
  /path/to/icefall/egs/librispeech/ASR/conformer_ctc/exp/epoch-999.pt

and pass --epoch 999 --avg 1 to conformer_ctc/decode.py.

exp/cpu_jit.pt is generated by the following command:

./conformer_ctc/export.py \
  --epoch 77 \
  --avg 55 \
  --jit 1 \ 
  --lang-dir data/lang_bpe_500 \
  --exp-dir conformer_ctc/exp_500_att0.8

Deploy your model in C++ using k2

To deploy your model in C++ using k2 without depending on Python, do the following:

# Note: It requires torch >= 1.8.0
git clone https://github.com/k2-fsa/k2
cd k2
git checkout v2.0-pre
mkdir build_release
cd build_release
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore

CTC decoding

cd k2/build_release
./bin/ctc_decode \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

HLG decoding

./bin/hlg_decode \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

HLG decoding + n-gram LM rescoring

NOTE: V100 GPU with 16 GB RAM is known NOT to work because of OOM. V100 GPU with 32 GB RAM is known to work.

./bin/ngram_lm_rescore \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
  --ngram_lm_scale 1.0 \
  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

HLG decoding + n-gram LM rescoring + attention decoder rescoring

NOTE: V100 GPU with 16 GB RAM is known NOT to work because of OOM. V100 GPU with 32 GB RAM is known to work.

./bin/attention_rescore \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
  --ngram_lm_scale 2.0 \
  --attention_scale 2.0 \
  --num_paths 100 \
  --nbest_scale 0.5 \
  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
  --sos_id 1 \
  --eos_id 1 \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav