Description

Finnish Attention-based Encoder-Decoder model trained on:

Puhelahjat (1500h colloquial Finnish donated by huge number of volunteers)
Finnish Parliament ASR Corpus (3000h speech from the sessions of the Finnish Parliament)

The Encoder is a CRDNN (Conv+LSTM+DNN), Decoder is GRU.

Performance expectations

This is a relatively fast and compact model (~40M parameters), performance is not state-of-the-art. This does not include a language model, the model is fully end-to-end.

This model should generalize to many types of speech. However, the model will also try to match colloquial speech (unlike some models which have learned to follow the written forms of Finnish). In fact being able to recognise many different dialects is a goal of the Puhelahjat data. The model is not especially robust to noise.