ft_rugec_A

This model is a fine-tuned version of mika5883/pretrain_rugec_msu on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.2509	0.1044	100	0.2356
0.1867	0.2088	200	0.2216
0.1755	0.3132	300	0.2133
0.1629	0.4175	400	0.2101
0.1603	0.5219	500	0.2097
0.1641	0.6263	600	0.2078
0.158	0.7307	700	0.2041
0.1647	0.8351	800	0.1978
0.1494	0.9395	900	0.2037
0.1363	1.0438	1000	0.2025
0.1323	1.1482	1100	0.2017
0.1256	1.2526	1200	0.2039
0.126	1.3570	1300	0.2030
0.1272	1.4614	1400	0.2056
0.1227	1.5658	1500	0.2055
0.1302	1.6701	1600	0.1990
0.1226	1.7745	1700	0.2035
0.1168	1.8789	1800	0.2011
0.1285	1.9833	1900	0.1996
0.1137	2.0877	2000	0.1991
0.1107	2.1921	2100	0.2025
0.112	2.2965	2200	0.2025
0.1092	2.4008	2300	0.2033
0.1049	2.5052	2400	0.2046
0.1085	2.6096	2500	0.2046
0.1094	2.7140	2600	0.2034
0.1099	2.8184	2700	0.2034
0.1182	2.9228	2800	0.2033