MSP-Fusion

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8566
  • Wer: 0.4860
  • Cer: 0.3326

Evaluation

Note: we evaluate the test data set with batch_size=1 on purpose due to this issue. Since padded inputs don't yield the exact same output as non-padded inputs, a better WER can be achieved by not padding the input at all.

  • Test WER Audio Only: 0.176
  • Test CER Audio Only: 0.063
  • Test WER Visual Only: 0.460
  • Test CER Visual Only: 0.242
  • Test WER Audio Visual: 0.404
  • Test CER Audio Visual: 0.174

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000.0
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss Wer Cer
0.6368 0.6821 500 1.4708 0.5569 0.4089
0.8647 1.3643 1000 1.4219 0.5721 0.4132
0.8782 2.0464 1500 1.2325 0.5560 0.3862
1.3794 2.7285 2000 1.1767 0.5476 0.3726
0.6026 3.4106 2500 1.2343 0.5449 0.3741
1.3744 4.0928 3000 1.1807 0.5377 0.3707
0.5876 4.7749 3500 1.0375 0.5177 0.3506
1.0793 5.4570 4000 1.0642 0.5186 0.3559
1.0835 6.1392 4500 1.0440 0.5341 0.3629
1.3389 6.8213 5000 0.9720 0.5032 0.3437
1.3464 7.5034 5500 0.9823 0.5339 0.3599
1.0784 8.1855 6000 1.0128 0.5233 0.3524
1.0948 8.8677 6500 1.0766 0.5071 0.3497
0.8240 9.5498 7000 1.0264 0.5050 0.3477
1.0965 10.2319 7500 1.1045 0.5195 0.3572
1.8511 10.9141 8000 0.8566 0.4860 0.3326
0.8268 11.5962 8500 0.9610 0.5128 0.3470
1.5900 12.2783 9000 1.0006 0.5167 0.3499
1.5875 12.9604 9500 1.1456 0.5219 0.3580
1.0825 13.6426 10000 1.0215 0.5180 0.3532
0.5846 14.3247 10500 1.0610 0.5155 0.3538
0.8231 15.0068 11000 0.8984 0.5095 0.3439
0.8113 15.6889 11500 0.9879 0.5107 0.3484
1.6159 16.3711 12000 1.0044 0.5155 0.3524
1.2974 17.0532 12500 0.9897 0.5051 0.3454
0.8617 17.7353 13000 1.0009 0.5060 0.3458
0.2714 18.4175 13500 0.9957 0.5083 0.3471
0.7658 19.0996 14000 0.9535 0.5068 0.3454
1.5706 19.7817 14500 0.9963 0.5084 0.3466

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.9.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
25
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MahmoodAnaam/MSP-Fusion-V0

Finetunes
1 model

Dataset used to train MahmoodAnaam/MSP-Fusion-V0