--- tags: - k2-fsa - icefall - audio - automatic-speech-recognition language: - et license: apache-2.0 metrics: - wer model-index: - name: conformer-ctc et results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: ERR2020 type: audio metrics: - name: Wer type: wer value: 12.1 --- # conformer-ctc et Icefall conformer-ctc3 based recipe (https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc3) trained Estonian ASR model using ERR2020 dataset - WER on ERR2020: 12.1 - WER on mozilla commonvoice_11: 23.2 For usage: - clone this repo (`git clone https://huggingface.co/rristo/icefall_conformer_ctc3_et`) - go to repo (`cd icefall_conformer_ctc3_et`) - build docker image for needed libraries (`build.sh` or `build.bat`) - run docker container (`run.sh`or `run.sh`). This mounts current directory - run notebook `err2020/conformer_ctc3_usage.ipynb` for example usage - currently expects audio to be in .wav format ## Model description ASR model for Estonian, uses Estonian Public Broadcasting data ERR2020 data (around 340 hours of audio) ## Intended uses & limitations Pretty much a toy model, trained on limited amount of data. Might not work well on data out of domain (especially spontaneous/noisy data). ## Training and evaluation data Trained on ERR2020 data, evaluated on ERR2020 and mozilla commonvoice test data. ## Training procedure Used Icefall conformer-ctc3 based recipe (https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc3) ### Training results TODO ### Framework versions - icefall - k2 - kaldifeat==1.24 - lhotse==1.15.0 - torch==2.0.0