Whisper Tiny RixVox Swedish

This is a Whisper tiny finetuned for Swedish using the RixVox dataset.

Please note that this model, as every other encoder-decoder speech-to-text model, is prone to hallucinating on unexpected inputs and treats the task as translation rather than transcription. I.e your mileage may vary depending on filtering and type of data.

In this release the entire encoder was frozen. Subsequent releases will not do this if the generalization to other types of data (i.e not parliamentary speeches) is kept when not freezing the encoder.

Evaluation

Fleurs WER: 51.68
Fleurs WER (normalized*): 48.09

*) Normalization is done by applying the following to source and generated texts:

def normalize(s):
    return ' '.join([ x for x in sub('[^0-9a-zåäöA-ZÅÄÖ ]', ' ', s.lower()).split() ])

Training

Training was done using Huggingface and Deepspeed with ZeRO stage 2.

learning rate: 1e-5
optimizer: CPUAdamW (Deepspeed)
lr scheduler: linear
warmup steps: 500
per device batch size: 32
GPUs: 8 x NVIDIA A100 40GB
total batch size: 160
steps: 10000
lowercase: no
fp16
entire encoder was frozen

KBLab
/

whisper-tiny-rixvox

Whisper Tiny RixVox Swedish

Evaluation

Training

Dataset used to train KBLab/whisper-tiny-rixvox