whisper_large_v3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 2
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Cer	Validation Loss	Wer
1.8187	1.0	2111	45.3063	0.6914	66.9946
0.7214	2.0	4222	37.9743	0.6309	55.6871
0.5298	3.0	6333	29.0135	0.6026	45.6557
0.4164	4.0	8444	35.3396	0.6037	54.7338
0.3355	5.0	10555	27.3975	0.5956	42.6625
0.2702	6.0	12666	26.9102	0.6047	42.2039
0.2202	7.0	14777	21.7689	0.6023	35.8946
0.1806	8.0	16888	20.1071	0.6053	32.7984
0.1496	9.0	18999	20.3211	0.6262	33.2826
0.1227	10.0	21110	19.5237	0.6374	31.9854
0.1013	11.0	23221	18.4214	0.6532	30.6836
0.0859	12.0	25332	18.6292	0.6505	30.9128
0.0728	13.0	27443	19.0582	0.6658	31.7761
0.0629	14.0	29554	17.9456	0.6691	30.1198
0.0549	15.0	31665	17.5997	0.6693	29.5186
0.0479	16.0	33776	18.0434	0.6894	30.0882
0.043	17.0	35887	17.4846	0.6831	29.3805
0.0385	18.0	37998	17.9625	0.6906	29.9607
0.0344	19.0	40109	16.9491	0.6914	28.6015
0.0315	20.0	42220	16.9293	0.6968	28.3968
0.029	21.0	44331	16.8717	0.6966	28.3301
0.0254	22.0	46442	16.9834	0.6997	28.4600
0.0236	23.0	48553	16.9067	0.6967	28.3442
0.0227	24.0	50664	16.9137	0.7046	28.2646
0.0212	25.0	52775	16.8825	0.7053	28.2705

Safetensors

Model size

2B params

Tensor type

F32