metadata

language: fi
thumbnail: null
tags:
  - automatic-speech-recognition
  - Attention
  - pytorch
  - speechbrain
metrics:
  - wer
  - cer

Description

Attention-based Encoder-Decoder model trained on Puhelahjat (1500h colloquial Finnish donated by huge number of volunteers) and Finnish Parliament ASR Corpus (3000h speech from the sessions of the Finnish Parliament) The Encoder is a CRDNN (Conv+LSTM+DNN), Decoder is GRU.

Performance expectations

This is a relatively fast and compact model (~40M parameters), performance is not state-of-the-art. This does not include a language model, the model is fully end-to-end.

This model should generalize to many types of speech. However, the model will also try to match colloquial speech (unlike some models which have learned to follow the written forms of Finnish).