adapter_head_full_const_lr_1e-4_l20-l23_const_lr_1e-7_l1-l19

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the common_voice_17_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.3638
Wer: 0.1946
Cer: 0.0323

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.1716	2.3077	750	0.2372	0.3536	0.0576
0.0888	4.6154	1500	0.2341	0.3066	0.0509
0.0487	6.9231	2250	0.2555	0.2823	0.0467
0.0221	9.2308	3000	0.2957	0.2668	0.0444
0.0193	11.5385	3750	0.3013	0.2461	0.0411
0.0162	13.8462	4500	0.3230	0.2584	0.0431
0.0107	16.1538	5250	0.3377	0.2454	0.0408
0.0106	18.4615	6000	0.3370	0.2473	0.0413
0.0111	20.7692	6750	0.3457	0.2448	0.0414
0.0084	23.0769	7500	0.3279	0.2302	0.0387
0.0083	25.3846	8250	0.3402	0.2308	0.0382
0.009	27.6923	9000	0.3411	0.2302	0.0384
0.0085	30.0	9750	0.3311	0.2292	0.0375
0.006	32.3077	10500	0.3492	0.2238	0.0371
0.0063	34.6154	11250	0.3560	0.2330	0.0381
0.0064	36.9231	12000	0.3584	0.2259	0.0379
0.0054	39.2308	12750	0.3484	0.2123	0.0351
0.0041	41.5385	13500	0.3565	0.2131	0.0356
0.0044	43.8462	14250	0.3522	0.2171	0.0363
0.0025	46.1538	15000	0.3702	0.2084	0.0350
0.0073	48.4615	15750	0.3579	0.2203	0.0360
0.0048	50.7692	16500	0.3462	0.2116	0.0353
0.0053	53.0769	17250	0.3264	0.2014	0.0337
0.0028	55.3846	18000	0.3560	0.2059	0.0343
0.0039	57.6923	18750	0.3685	0.2081	0.0348
0.0026	60.0	19500	0.3649	0.2075	0.0347
0.0027	62.3077	20250	0.3636	0.2091	0.0350
0.0038	64.6154	21000	0.3675	0.2147	0.0350
0.0024	66.9231	21750	0.3707	0.2050	0.0341
0.0045	69.2308	22500	0.3397	0.1961	0.0329
0.0032	71.5385	23250	0.3645	0.1985	0.0332
0.0041	73.8462	24000	0.3451	0.2047	0.0338
0.0018	76.1538	24750	0.3468	0.1935	0.0321
0.0045	78.4615	25500	0.3366	0.1982	0.0332
0.0023	80.7692	26250	0.3551	0.1996	0.0336
0.0022	83.0769	27000	0.3778	0.1948	0.0331
0.0026	85.3846	27750	0.3622	0.1950	0.0328
0.0013	87.6923	28500	0.3600	0.1908	0.0319
0.0032	90.0	29250	0.3632	0.1945	0.0324
0.0027	92.3077	30000	0.3436	0.1913	0.0320
0.002	94.6154	30750	0.3721	0.1985	0.0334
0.0022	96.9231	31500	0.3659	0.1966	0.0330
0.0025	99.2308	32250	0.3638	0.1946	0.0323

Framework versions

Transformers 4.41.2
Pytorch 2.3.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

LevonHakobyan
/

adapter_head_full_const_lr_1e-4_l20-l23_const_lr_1e-7_l1-l19

adapter_head_full_const_lr_1e-4_l20-l23_const_lr_1e-7_l1-l19

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results