whisper-tiny-nob / README.md
pere's picture
readme
6b95e5f
|
raw
history blame
1.72 kB
metadata
language:
  - 'no'
license: apache-2.0
tags:
  - whisper-event
  - norwegian
datasets:
  - NbAiLab/NCC_S
  - NbAiLab/NPSC
  - NbAiLab/NST
  - google/fleurs
metrics:
  - wer
model-index:
  - name: Whisper Tiny Norwegian Bokmål
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: FLEURS
          type: google/fleurs
          config: nb_no
          split: test
          args: nb_no
        metrics:
          - name: Wer
            type: wer
            value: 47.08

Whisper Tiny Norwegian Bokmål

This model is a fine-tuned version of openai/whisper-medium trained on several datasets.

It is currently in the middle of a large trainingi. Currently achieves the following results on the evaluation set:

  • Loss: 1.464
  • Wer: 47.08

Model description

The model is trained on a large corpus of roughly 5.000 hours of voice. The sources are subtitles from the Norwegian broadcaster NRK, transcribed speeches from the Norwegian parliament and voice recordings from Norsk Språkteknologi.

Intended uses & limitations

The model will be free for everyone to use when it is finished.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-06
  • train_batch_size: 128
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 100.000 (currently 4.000)
  • mixed_precision_training: fp16

Training results

See Tensorboad Metrics