--- license: apache-2.0 language: - ja library_name: espnet tags: - automatic-speech-recognition --- # reazonspeech-espnet-v2 `reazonspeech-espnet-v2` is an automatic speech recognition (ASR) model trained on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech). ## Model Architecture The general architecture is the same as [reazonspeech-espnet-v1](https://huggingface.co/reazon-research/reazonspeech-espnet-v1). * Conformer-Transducer model with 118.85M parameters. * We trained this model for 33 epoch using Adam optimizer. The maximum learning rate was 0.02, with 15000 warmup steps. * The training audio files were sampled at 16khz. Make sure that your input audio files have the same sampling rate. ## Usage We provide `transcribe()` function that is suitable to use with this model. ``` from espnet2.bin.asr_inference import Speech2Text from reazonspeech.espnet.asr import transcribe speech2text = Speech2Text( "exp/asr_train_asr_conformer_raw_jp_char/config.yaml", "exp/asr_train_asr_conformer_raw_jp_char/valid.acc.ave_10best.pth", device="cuda" ) for cap in transcribe("speech.wav", speech2text): print(cap) ``` ## License [Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)