slSeanWU
fix demo script
6efc2a3
metadata
tags:
  - espnet
  - audio
  - speech-recognition
language: zh
datasets:
  - commonvoice
license: cc-by-4.0

ESPnet2 ASR model

espnet/shihlun-asr-commonvoice-zh-TW

This model was trained by Shih-Lun Wu using the commonvoice recipe in espnet.

Demo: How to use in ESPnet2

cd espnet
pip install -e .
cd egs2/commonvoice/asr1
./asr.sh \
  --stage 1 \
  --stop_stage 13 \
  --nj 32 \
  --inference_nj 32 \
  --skip_train true \
  --train_set "train_zh_TW" \
  --valid_set "dev_zh_TW" \
  --test_sets "dev_zh_TW test_zh_TW" \
  --lang "zh_TW" \
  --local_data_opts "--lang zh-TW" \
  --speed_perturb_factors "0.9 1.0 1.1" \
  --lm_train_text "data/train_zh_TW/text" \
  --token_type bpe \
  --nbpe 2542 \
  --bpemode "unigram" \
  --bpe_train_text "data/train_zh_TW/text" \
  --use_lm false \
  --inference_asr_model "valid.acc.best.pth" \
  --download_model "espnet/shihlun-asr-commonvoice-zh-TW"

RESULTS

Environments

  • date: Thu Sep 1 21:49:10 UTC 2022
  • python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0]
  • espnet version: espnet 202207
  • pytorch version: pytorch 1.12.1+cu102
  • Git hash: 13db69d3befc3c82a5ff5a11e28bf79d5030603f
    • Commit date: Mon Aug 29 13:44:35 2022 +0000

asr_train_asr_conformer5_raw_zh_TW_bpe2542_sp_lr1.0

CER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
inference_asr_model_valid.acc.best/dev_zh_TW 2627 22200 97.7 2.1 0.2 0.0 2.4 9.5
inference_asr_model_valid.acc.best/test_zh_TW 2627 21991 98.0 1.6 0.4 0.1 2.1 7.7

TER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
inference_asr_model_valid.acc.best/dev_zh_TW 2627 24827 98.6 1.2 0.2 0.0 1.5 4.0
inference_asr_model_valid.acc.best/test_zh_TW 2627 24618 98.8 0.9 0.4 0.1 1.3 3.4