# Introduction

## How to clone this repo
```
sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09
git lfs pull
```

**Catuion**: You have to run `git lfs pull`. Otherwise, you will be SAD later.

-----

## Description

This repo provides pre-trained conformer CTC model for the librispeech dataset
using [icefall][icefall].

The commands for training are:

```
cd egs/librispeech/ASR/conformer_ctc
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./conformer_ctc/train.py \
  --exp-dir conformer_ctc/exp_500_att0.8 \
  --lang-dir data/lang_bpe_500 \
  --att-rate 0.8 \
  --full-libri 1 \
  --max-duration 200 \
  --concatenate-cuts 0 \
  --world-size 4 \
  --bucketing-sampler 1 \
  --start-epoch 0 \
  --num-epochs 90
```

The command for decoding is:
```
./conformer_ctc/decode.py \
  --exp-dir conformer_ctc/exp_500_att0.8 \
  --lang-dir data/lang_bpe_500 \
  --max-duration 30 \
  --concatenate-cuts 0 \
  --bucketing-sampler 1 \
  --num-paths 1000 \
  --epoch 77 \
  --avg 55 \
  --method attention-decoder \
  --nbest-scale 0.5
```

You can find the decoding log for the above command in this
repo: [log/log-decode-2021-11-09-17-38-28](log/log-decode-2021-11-09-17-38-28).

The best WER for the librispeech test dataset is:

|     | test-clean | test-other |
|-----|------------|------------|
| WER | 2.42       | 5.73       |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 2.0            | 2.0             |


# File description

- [log][log], this directory contains the decoding log
- [test_wavs][test_wavs], this directory contains wave files for testing the pre-trained model
- [data][data], this directory contains files generated by [prepare.sh][prepare]

Note: For the `data/lm` directory, we provide only `G_4_gram.pt`. If you need other files
in this directory, please run [prepare.sh][prepare].

- [exp][exp], this directory contains two files: `preprained.pt` and `cpu_jit.pt`.

`exp/pretrained.pt` is generated by the following command:
```
./conformer_ctc/export.py \
  --epoch 77 \
  --avg 55 \
  --jit 0 \ 
  --lang-dir data/lang_bpe_500 \
  --exp-dir conformer_ctc/exp_500_att0.8
```

**HINT**: To use `pre-trained.pt` to compute the WER for test-clean and test-other,
just do the following:
```
cp icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/pretrained.pt \
  /path/to/icefall/egs/librispeech/ASR/conformer_ctc/exp/epoch-999.pt
```
and pass `--epoch 999 --avg 1` to `conformer_ctc/decode.py`.

`exp/cpu_jit.pt` is generated by the following command:
```
./conformer_ctc/export.py \
  --epoch 77 \
  --avg 55 \
  --jit 1 \ 
  --lang-dir data/lang_bpe_500 \
  --exp-dir conformer_ctc/exp_500_att0.8
```

# Deploy your model in C++ using k2

To deploy your model in C++ using k2 without depending on Python, do the following:

```
# Note: It requires torch >= 1.8.0
git clone https://github.com/k2-fsa/k2
cd k2
git checkout v2.0-pre
mkdir build_release
cd build_release
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j ctc_decode hlg_decode ngram_lm_rescore attention_rescore
```

## CTC decoding
```
cd k2/build_release
./bin/ctc_decode \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --bpe_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/bpe.model \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
```

## HLG decoding

```
./bin/hlg_decode \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
```

## HLG decoding + n-gram LM rescoring

**NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM.
V100 GPU with 32 GB RAM is known to work.

```
./bin/ngram_lm_rescore \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
  --ngram_lm_scale 1.0 \
  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
```

## HLG decoding + n-gram LM rescoring + attention decoder rescoring

**NOTE**: V100 GPU with 16 GB RAM is known NOT to work because of OOM.
V100 GPU with 32 GB RAM is known to work.

```
./bin/attention_rescore \
  --use_gpu true \
  --nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --hlg ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --g ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lm/G_4_gram.pt \
  --ngram_lm_scale 2.0 \
  --attention_scale 2.0 \
  --num_paths 100 \
  --nbest_scale 0.5 \
  --word_table ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/words.txt \
  --sos_id 1 \
  --eos_id 1 \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav
```

[prepare]: https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare.sh
[exp]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/exp
[data]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/data
[test_wavs]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/test_wavs
[log]: https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main/log
[icefall]: https://github.com/k2-fsa/icefall