--- tags: - espnet - audio - automatic-speech-recognition language: - et license: apache-2.0 metrics: - wer model-index: - name: e-branchformer et results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: ERR2020 type: audio metrics: - name: Wer type: wer value: 9.9 --- # e-branchformer et Espnet2 e-branchformer based recipe (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1) trained Estonian ASR model using ERR2020 dataset - WER on ERR2020: 9.9 - WER on mozilla commonvoice_11: 20.8 For usage: - clone this repo (`git clone https://huggingface.co/rristo/espnet_ebranchformer_et`) - go to repo (`cd espnet_ebranchformer_et`) - build docker image for needed libraries (`build.sh` or `build.bat`) - run docker container (`run.sh` or `run.sh`). This mounts current directory - run notebook `example_usage.ipynb` for example usage - currently expects audio to be in .wav format ## Model description ASR model for Estonian, uses Estonian Public Broadcasting data ERR2020 data (around 340 hours of audio) ## Intended uses & limitations Pretty much a toy model, trained on limited amount of data. Might not work well on data out of domain (especially spontaneous/noisy data). ## Training and evaluation data Trained on ERR2020 data, evaluated on ERR2020 and mozilla commonvoice test data. ## Training procedure Used espnet e-branchformer based recipe (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1) ### Training results Look into folder exp/images. Validation results are in exp/RESULTS.md ### Framework versions - espnet2