Edit model card

ssw_finetune

This model is a fine-tuned version of Akashpb13/Swahili_xlsr on the ml-superb-subset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4301
  • Wer: 42.1488

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9.6e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 25
  • training_steps: 500
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
22.1208 0.8333 10 25.1031 100.5510
12.838 1.6667 20 10.4898 100.0
4.2236 2.5 30 3.9356 100.0
3.4491 3.3333 40 3.4590 100.0
3.2593 4.1667 50 3.3211 100.0
3.1611 5.0 60 3.1737 100.0
3.1157 5.8333 70 3.1089 100.0
3.0472 6.6667 80 3.0868 100.0
3.0291 7.5 90 3.0445 100.0
2.9996 8.3333 100 3.0058 100.0
2.9187 9.1667 110 2.9600 100.0
2.7708 10.0 120 2.7274 100.0
2.5396 10.8333 130 2.4602 100.0
2.0911 11.6667 140 1.8863 100.0
1.4477 12.5 150 1.2924 95.8678
1.042 13.3333 160 0.9620 80.1653
0.8089 14.1667 170 0.7520 67.4931
0.6621 15.0 180 0.6530 53.7190
0.5476 15.8333 190 0.5838 50.6887
0.4866 16.6667 200 0.5662 50.4132
0.4296 17.5 210 0.5303 49.5868
0.3977 18.3333 220 0.5121 47.9339
0.392 19.1667 230 0.4895 47.3829
0.346 20.0 240 0.4825 44.3526
0.3226 20.8333 250 0.4628 45.1791
0.3145 21.6667 260 0.4662 45.1791
0.2948 22.5 270 0.4492 41.8733
0.2857 23.3333 280 0.4484 43.2507
0.2571 24.1667 290 0.4511 43.2507
0.2706 25.0 300 0.4382 41.8733
0.2404 25.8333 310 0.4528 42.1488
0.2498 26.6667 320 0.4428 41.5978
0.2381 27.5 330 0.4377 40.2204
0.2142 28.3333 340 0.4300 41.0468
0.2236 29.1667 350 0.4305 42.1488
0.2249 30.0 360 0.4253 41.0468
0.209 30.8333 370 0.4272 42.9752
0.2071 31.6667 380 0.4363 43.8017
0.2209 32.5 390 0.4328 44.6281
0.2012 33.3333 400 0.4351 44.0771
0.1895 34.1667 410 0.4362 43.8017
0.1921 35.0 420 0.4383 45.1791
0.1805 35.8333 430 0.4381 45.1791
0.1963 36.6667 440 0.4331 41.3223
0.1829 37.5 450 0.4301 41.5978
0.1927 38.3333 460 0.4290 41.8733
0.1779 39.1667 470 0.4289 42.4242
0.1892 40.0 480 0.4302 42.1488
0.2025 40.8333 490 0.4300 42.4242
0.2105 41.6667 500 0.4301 42.1488

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
38
Safetensors
Model size
315M params
Tensor type
F32
·

Finetuned from

Evaluation results