metadata

language:
  - 'no'
license: apache-2.0
tags:
  - audio
  - asr
  - automatic-speech-recognition
  - hf-asr-leaderboard
model-index:
  - name: scream_sextusdecimus_virtual_tsfix_small
    results: []

scream_sextusdecimus_virtual_tsfix_small

This model is a fine-tuned version of openai/whisper-small on the NbAiLab/ncc_speech dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
lr_scheduler_type: linear
per_device_train_batch_size: 32
total_train_batch_size_per_node: 128
total_train_batch_size: 1024
total_optimization_steps: 20,000
starting_optimization_step: None
finishing_optimization_step: 20,000
num_train_dataset_workers: 32
num_hosts: 8
total_num_training_examples: 20,480,000
steps_per_epoch: To be computed after first epoch
num_beams: 5
dropout: True
bpe_dropout_probability: 0.1
activation_dropout_probability: 0.1

Training results

step	eval_loss	train_loss	eval_wer	eval_cer	eval_exact_wer	eval_exact_cer
0	1.2807	3.0725	196.6092	157.4275	196.6092	157.4275
1000	0.5902	1.0592	15.1695	4.8382	15.1695	4.8382
2000	0.4240	0.8640	11.3623	3.9308	11.3623	3.9308
3000	0.4213	0.7930	9.4587	3.3537	9.4587	3.3537
4000	0.4353	0.7986	9.3694	3.5263	9.3694	3.5263
5000	0.4697	0.7580	9.7858	4.1478	9.7858	4.1478
6000	0.4535	0.7003	10.0238	4.2119	10.0238	4.2119
7000	0.4608	0.7296	8.8638	3.4228	8.8638	3.4228
8000	0.3902	0.7053	8.9233	3.6003	8.9233	3.6003
9000	0.3575	0.7124	9.3992	3.9702	9.3992	3.9702

Framework versions

Transformers 4.30.0.dev0
Datasets 2.12.1.dev0
Tokenizers 0.13.3