luomingshuang's picture
add other decoding methods
27f7823
|
raw
history blame
2.77 kB

Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/349

Pre-trained Transducer-Stateless2 models for the WenetSpeech dataset with icefall.

The model was trained on the L subset of WenetSpeech with the scripts in icefall based on the latest version k2.

Training procedure

The main repositories are list below, we will update the training and decoding scripts with the update of version.
k2: https://github.com/k2-fsa/k2 icefall: https://github.com/k2-fsa/icefall lhotse: https://github.com/lhotse-speech/lhotse

git clone https://github.com/k2-fsa/icefall
cd icefall
  • Preparing data.
cd egs/wenetspeech/ASR
bash ./prepare.sh
  • Training
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
./pruned_transducer_stateless2/train.py \
                  --world-size 8 \
                  --num-epochs 15 \
                  --start-epoch 0 \
                  --exp-dir pruned_transducer_stateless2/exp \
                  --lang-dir data/lang_char \
                  --max-duration 180 \
                  --valid-interval 3000 \
                  --model-warm-step 3000 \
                  --save-every-n 8000 \
                  --training-subset L

Evaluation results

The decoding results (WER%) on WenetSpeech(dev, test-net and test-meeting) are listed below, we got this result by averaging models from epoch 9 to 10. The WERs are

dev test-net test-meeting comment
greedy search 7.80 8.75 13.49 --epoch 10, --avg 2, --max-duration 100
modified beam search (beam size 4) 7.76 8.71 13.41 --epoch 10, --avg 2, --max-duration 100
fast beam search (1best) 7.94 8.74 13.80 --epoch 10, --avg 2, --max-duration 1500
fast beam search (nbest) 9.82 10.98 16.37 --epoch 10, --avg 2, --max-duration 600
fast beam search (nbest oracle) 6.88 7.18 11.77 --epoch 10, --avg 2, --max-duration 600
fast beam search (nbest LG) 14.94 16.14 22.93 --epoch 10, --avg 2, --max-duration 600