MSP-Fusion

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8566
Wer: 0.4860
Cer: 0.3326

Evaluation

Note: we evaluate the test data set with batch_size=1 on purpose due to this issue. Since padded inputs don't yield the exact same output as non-padded inputs, a better WER can be achieved by not padding the input at all.

Test WER Audio Only: 0.176
Test CER Audio Only: 0.063
Test WER Visual Only: 0.460
Test CER Visual Only: 0.242
Test WER Audio Visual: 0.404
Test CER Audio Visual: 0.174

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 64
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000.0
num_epochs: 20.0

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.6368	0.6821	500	1.4708	0.5569	0.4089
0.8647	1.3643	1000	1.4219	0.5721	0.4132
0.8782	2.0464	1500	1.2325	0.5560	0.3862
1.3794	2.7285	2000	1.1767	0.5476	0.3726
0.6026	3.4106	2500	1.2343	0.5449	0.3741
1.3744	4.0928	3000	1.1807	0.5377	0.3707
0.5876	4.7749	3500	1.0375	0.5177	0.3506
1.0793	5.4570	4000	1.0642	0.5186	0.3559
1.0835	6.1392	4500	1.0440	0.5341	0.3629
1.3389	6.8213	5000	0.9720	0.5032	0.3437
1.3464	7.5034	5500	0.9823	0.5339	0.3599
1.0784	8.1855	6000	1.0128	0.5233	0.3524
1.0948	8.8677	6500	1.0766	0.5071	0.3497
0.8240	9.5498	7000	1.0264	0.5050	0.3477
1.0965	10.2319	7500	1.1045	0.5195	0.3572
1.8511	10.9141	8000	0.8566	0.4860	0.3326
0.8268	11.5962	8500	0.9610	0.5128	0.3470
1.5900	12.2783	9000	1.0006	0.5167	0.3499
1.5875	12.9604	9500	1.1456	0.5219	0.3580
1.0825	13.6426	10000	1.0215	0.5180	0.3532
0.5846	14.3247	10500	1.0610	0.5155	0.3538
0.8231	15.0068	11000	0.8984	0.5095	0.3439
0.8113	15.6889	11500	0.9879	0.5107	0.3484
1.6159	16.3711	12000	1.0044	0.5155	0.3524
1.2974	17.0532	12500	0.9897	0.5051	0.3454
0.8617	17.7353	13000	1.0009	0.5060	0.3458
0.2714	18.4175	13500	0.9957	0.5083	0.3471
0.7658	19.0996	14000	0.9535	0.5068	0.3454
1.5706	19.7817	14500	0.9963	0.5084	0.3466

Framework versions

Transformers 5.0.0
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 25

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for MahmoodAnaam/MSP-Fusion-V0

Finetunes

1 model

MahmoodAnaam
/

MSP-Fusion-V0

MSP-Fusion

Evaluation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for MahmoodAnaam/MSP-Fusion-V0

Dataset used to train MahmoodAnaam/MSP-Fusion-V0