strict_balanced_cf_seed-63_1e-3

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 64
seed: 63
gradient_accumulation_steps: 8
total_train_batch_size: 256
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 32000
num_epochs: 20.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.9731	0.9998	1486	4.4109	0.2939
4.2997	1.9997	2972	3.9113	0.3321
3.6863	2.9997	4458	3.6333	0.3565
3.4969	3.9996	5944	3.4726	0.3713
3.2641	4.9995	7430	3.3771	0.3800
3.1935	5.9994	8916	3.3143	0.3861
3.0853	6.9993	10402	3.2722	0.3901
3.0449	7.9999	11889	3.2485	0.3926
2.9853	8.9998	13375	3.2332	0.3941
2.9595	9.9997	14861	3.2198	0.3959
2.9233	10.9997	16347	3.2073	0.3974
2.9006	11.9996	17833	3.2055	0.3977
2.8802	12.9995	19319	3.2030	0.3983
2.8606	13.9994	20805	3.1952	0.3993
2.8504	14.9993	22291	3.1972	0.3993
2.8315	15.9999	23778	3.1882	0.4003
2.8315	16.9998	25264	3.1864	0.4007
2.8104	17.9997	26750	3.1891	0.4008
2.8207	18.9997	28236	3.1880	0.4006
2.7999	19.9982	29720	3.1862	0.4012