Edit model card

ESPnet2 ASR model

espnet/yoshiki_chime4_whisper_medium_finetuning

This model was trained by Yoshiki using chime4 recipe in espnet.

Demo: How to use in ESPnet2

Follow the ESPnet installation instructions if you haven't done that already.

cd espnet
git checkout fe00740b80cd26fad7c550cd9e975609deb664db
pip install -e .
cd egs2/chime4/asr1
./run.sh --skip_data_prep false --skip_train true --download_model espnet/yoshiki_chime4_whisper_medium_finetuning

RESULTS

Environments

  • date: Fri Jul 21 19:08:31 JST 2023
  • python version: 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
  • espnet version: espnet 202304
  • pytorch version: pytorch 1.13.1
  • Git hash: d7172fcb7181ffdcca9c0061400254b63e37bf21
    • Commit date: Sat Jul 15 15:01:30 2023 +0900

/scratch/espnet-hackathon/egs2/chime4/asr1/exp4/asr_train_asr_whisper_full_warmup1500_raw_en_whisper_multilingual

WER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track 1640 24791 97.7 1.9 0.5 0.7 3.0 25.7
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track 1640 24792 95.9 3.3 0.8 0.8 4.9 37.0
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track 1320 19341 96.3 3.2 0.5 0.8 4.5 33.6
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track 1320 19344 93.1 5.8 1.1 1.2 8.1 43.3

CER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track 1640 141889 99.2 0.4 0.4 0.7 1.5 25.7
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track 1640 141900 98.2 0.9 0.9 0.8 2.6 37.0
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track 1320 110558 98.6 0.8 0.6 0.7 2.1 33.6
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track 1320 110572 96.5 1.9 1.5 1.2 4.7 43.3

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}

or arXiv:

@misc{watanabe2018espnet,
  title={ESPnet: End-to-End Speech Processing Toolkit},
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  year={2018},
  eprint={1804.00015},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.