Introduction

How to clone this repo

sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-bpe-500-2021-12-23

cd icefall-asr-librispeech-transducer-bpe-500-2021-12-23
git lfs pull

Catuion: You have to run git lfs pull. Otherwise, you will be SAD later.

The model in this repo is trained using the commit 5b6699a8354b70b23b252b371c612a35ed186ec2.

You can use

git clone https://github.com/k2-fsa/icefall
cd icefall
git checkout 5b6699a8354b70b23b252b371c612a35ed186ec2

to download icefall.

You can find the model information by visiting https://github.com/k2-fsa/icefall/blob/5b6699a8354b70b23b252b371c612a35ed186ec2/egs/librispeech/ASR/transducer/train.py#L191

In short, the encoder is a Conformer model with 8 heads, 12 encoder layers, 512-dim attention, 2048-dim feedforward; the decoder contains a 1024-dim embedding layer, plus a 2-layer LSTM with hidden size 512.

Description

This repo provides pre-trained RNN-T Conformer model for the librispeech dataset using icefall.

The commands for training are:

cd egs/librispeech/ASR/
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./transducer/train.py \
  --world-size 4 \
  --num-epochs 35 \
  --start-epoch 0 \
  --exp-dir transducer/exp-lr-2.5-full \
  --full-libri 1 \
  --max-duration 180 \
  --lr-factor 2.5

The command for decoding is:

epoch=34
avg=11

./transducer/decode.py \
  --epoch $epoch \
  --avg $avg \
  --exp-dir transducer/exp-lr-2.5-full \
  --bpe-model ./data/lang_bpe_500/bpe.model \
  --max-duration 100

You can find the decoding log for the above command in the log folder of this repo.

The best WER using greedy search is:

	test-clean	test-other
WER	3.07	7.51

File description

log, this directory contains the decoding log and decoding results
test_wavs, this directory contains wave files for testing the pre-trained model
data, this directory contains files generated by prepare.sh
exp, this directory contains only one file: preprained.pt

exp/pretrained.pt is generated by the following command:

./transducer/export.py \
  --epoch 34 \
  --avg 11 \
  --bpe-model data/lang_bpe_500/bpe.model \
  --exp-dir transducer/exp-lr-2.5-full

HINT: To use pre-trained.pt to compute the WER for test-clean and test-other, just do the following:

cp icefall-asr-librispeech-transducer-bpe-500-2021-12-23/exp/pretrained.pt \
  /path/to/icefall/egs/librispeech/ASR/transducer/exp/epoch-999.pt

and pass --epoch 999 --avg 1 to transducer/decode.py.