Introduction

How to clone this repo

sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-bpe-500-2021-12-17

cd icefall-asr-librispeech-transducer-bpe-500-2021-12-17
git lfs pull

Catuion: You have to run git lfs pull. Otherwise, you will be SAD later.

The model in this repo is trained using the commit cb04c8a7509425ab45fae888b0ca71bbbd23f0de.

You can use

git clone https://github.com/k2-fsa/icefall
cd icefall
git checkout cb04c8a7509425ab45fae888b0ca71bbbd23f0de

to download icefall.

You can find the model information by visiting https://github.com/k2-fsa/icefall/blob/cb04c8a7509425ab45fae888b0ca71bbbd23f0de/egs/librispeech/ASR/transducer/train.py#L196

In short, the encoder is a Conformer model with 8 heads, 12 encoder layers, 512-dim attention, 2048-dim feedforward; the decoder contains a 1024-dim embedding layer, plus a 4-layer LSTM with hidden size 512.

Description

This repo provides pre-trained RNN-T Conformer model for the librispeech dataset using icefall.

The commands for training are:

cd egs/librispeech/ASR/
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./transducer/train.py \
  --world-size 4 \
  --num-epochs 30 \
  --start-epoch 0 \
  --exp-dir transducer/exp-lr-2.5-full \
  --full-libri 1 \
  --max-duration 250 \
  --lr-factor 2.5

The command for decoding is:

epoch=26
avg=12

./transducer/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir transducer/exp-lr-2.5-full \
  --bpe-model ./data/lang_bpe_500/bpe.model \
  --max-duration 100

You can find the decoding log for the above command in this repo: log/log-decode-epoch-26-avg-12-2021-12-17-09-33-04.

The best WER using greedy search is:

	test-clean	test-other
WER	3.16	7.71

File description

log, this directory contains the decoding log and decoding results
test_wavs, this directory contains wave files for testing the pre-trained model
data, this directory contains files generated by prepare.sh
exp, this directory contains only one file: preprained.pt

exp/pretrained.pt is generated by the following command:

./transducer/export.py \
  --epoch 26 \
  --avg 12 \
  --bpe-model data/lang_bpe_500/bpe.model \
  --exp-dir transducer/exp-lr-2.5-full

HINT: To use pre-trained.pt to compute the WER for test-clean and test-other, just do the following:

cp icefall-asr-librispeech-transducer-bpe-500-2021-12-17/exp/pretrained.pt \
  /path/to/icefall/egs/librispeech/ASR/transducer/exp/epoch-999.pt

and pass --epoch 999 --avg 1 to transducer/decode.py.