20230824043245

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6512
Accuracy: 0.7473

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.0514	1.0	623	0.7220	0.5054
0.8415	2.0	1246	0.6761	0.5415
0.925	3.0	1869	0.7140	0.5126
0.8783	4.0	2492	0.6604	0.6245
0.7907	5.0	3115	0.6059	0.6787
0.7756	6.0	3738	0.6058	0.6931
0.7308	7.0	4361	1.0272	0.6173
0.7169	8.0	4984	0.7565	0.6679
0.689	9.0	5607	0.6401	0.7004
0.6368	10.0	6230	0.6674	0.7256
0.5682	11.0	6853	0.5540	0.7148
0.5974	12.0	7476	0.6804	0.7473
0.5286	13.0	8099	0.5929	0.7401
0.5348	14.0	8722	0.7100	0.7220
0.4956	15.0	9345	0.5456	0.7184
0.4654	16.0	9968	0.6426	0.7112
0.4273	17.0	10591	0.6307	0.7365
0.4259	18.0	11214	0.5385	0.7365
0.4454	19.0	11837	0.6045	0.7437
0.4176	20.0	12460	0.7234	0.7401
0.3953	21.0	13083	0.6217	0.7437
0.3847	22.0	13706	0.6348	0.7437
0.3717	23.0	14329	0.8536	0.7148
0.3512	24.0	14952	0.5710	0.7509
0.3237	25.0	15575	0.5594	0.7437
0.3102	26.0	16198	0.7130	0.7581
0.3302	27.0	16821	0.6404	0.7653
0.3066	28.0	17444	0.6608	0.7473
0.305	29.0	18067	0.6181	0.7617
0.2894	30.0	18690	0.7626	0.7329
0.2891	31.0	19313	0.6387	0.7545
0.2836	32.0	19936	0.5889	0.7437
0.2682	33.0	20559	0.7169	0.7473
0.2625	34.0	21182	0.6298	0.7617
0.246	35.0	21805	0.6207	0.7617
0.266	36.0	22428	0.6256	0.7473
0.2398	37.0	23051	0.7504	0.7617
0.2526	38.0	23674	0.6578	0.7473
0.2165	39.0	24297	0.6624	0.7617
0.2347	40.0	24920	0.6133	0.7365
0.2296	41.0	25543	0.6224	0.7509
0.2226	42.0	26166	0.6971	0.7473
0.2214	43.0	26789	0.6280	0.7509
0.2268	44.0	27412	0.6562	0.7473
0.2244	45.0	28035	0.6726	0.7509
0.2067	46.0	28658	0.6554	0.7581
0.1971	47.0	29281	0.5949	0.7581
0.2135	48.0	29904	0.6618	0.7437
0.2012	49.0	30527	0.6752	0.7581
0.1882	50.0	31150	0.6223	0.7581
0.2056	51.0	31773	0.6487	0.7473
0.1993	52.0	32396	0.6544	0.7509
0.197	53.0	33019	0.6673	0.7401
0.1867	54.0	33642	0.6563	0.7437
0.1715	55.0	34265	0.6780	0.7401
0.1787	56.0	34888	0.6906	0.7329
0.19	57.0	35511	0.6606	0.7437
0.1819	58.0	36134	0.6461	0.7437
0.1879	59.0	36757	0.6516	0.7437
0.1773	60.0	37380	0.6512	0.7473

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824043245

20230824043245

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824043245

Evaluation results