ESPnet2 ASR model

`espnet/yoshiki_chime4_whisper_medium_finetuning`

This model was trained by Yoshiki using chime4 recipe in espnet.

Demo: How to use in ESPnet2

Follow the ESPnet installation instructions if you haven't done that already.

cd espnet
git checkout fe00740b80cd26fad7c550cd9e975609deb664db
pip install -e .
cd egs2/chime4/asr1
./run.sh --skip_data_prep false --skip_train true --download_model espnet/yoshiki_chime4_whisper_medium_finetuning

RESULTS

Environments

date: Fri Jul 21 19:08:31 JST 2023
python version: 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]
espnet version: espnet 202304
pytorch version: pytorch 1.13.1
Git hash: d7172fcb7181ffdcca9c0061400254b63e37bf21
- Commit date: Sat Jul 15 15:01:30 2023 +0900

/scratch/espnet-hackathon/egs2/chime4/asr1/exp4/asr_train_asr_whisper_full_warmup1500_raw_en_whisper_multilingual

WER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track	1640	24791	97.7	1.9	0.5	0.7	3.0	25.7
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track	1640	24792	95.9	3.3	0.8	0.8	4.9	37.0
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track	1320	19341	96.3	3.2	0.5	0.8	4.5	33.6
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track	1320	19344	93.1	5.8	1.1	1.2	8.1	43.3

CER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track	1640	141889	99.2	0.4	0.4	0.7	1.5	25.7
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track	1640	141900	98.2	0.9	0.9	0.8	2.6	37.0
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track	1320	110558	98.6	0.8	0.6	0.7	2.1	33.6
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track	1320	110572	96.5	1.9	1.5	1.2	4.7	43.3

Citing ESPnet

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}

or arXiv:

@misc{watanabe2018espnet,
  title={ESPnet: End-to-End Speech Processing Toolkit},
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  year={2018},
  eprint={1804.00015},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}