metadata

license: apache-2.0
language:
  - ja
library_name: espnet
tags:
  - automatic-speech-recognition

reazonspeech-espnet-v2

reazonspeech-espnet-v2 is an automatic speech recognition (ASR) model trained on ReazonSpeech v2.0 corpus.

Model Architecture

The general architecture is the same as reazonspeech-espnet-v1.

Conformer-Transducer model with 118.85M parameters.
We trained this model for 33 epoch using Adam optimizer. The maximum learning rate was 0.02, with 15000 warmup steps.
The training audio files were sampled at 16khz. Make sure that your input audio files have the same sampling rate.

Usage

We provide transcribe() function that is suitable to use with this model.

from espnet2.bin.asr_inference import Speech2Text
from reazonspeech.espnet.asr import transcribe

speech2text = Speech2Text(
    "exp/asr_train_asr_conformer_raw_jp_char/config.yaml",
    "exp/asr_train_asr_conformer_raw_jp_char/valid.acc.ave_10best.pth",
    device="cuda"
)

for cap in transcribe("speech.wav", speech2text):
    print(cap)

License

Apaceh Licence 2.0