20230824043537

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3141
Accuracy: 0.7401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 4
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 60.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.7925	1.0	623	0.8673	0.4729
0.6122	2.0	1246	0.4006	0.5415
0.5656	3.0	1869	1.2100	0.4729
0.5981	4.0	2492	0.4232	0.5632
0.5284	5.0	3115	0.6388	0.5523
0.6128	6.0	3738	0.4463	0.5307
0.4769	7.0	4361	0.4020	0.6065
0.4415	8.0	4984	0.3773	0.6029
0.4284	9.0	5607	0.3718	0.6679
0.3893	10.0	6230	0.3479	0.6606
0.3707	11.0	6853	0.3415	0.6751
0.3845	12.0	7476	0.3645	0.6787
0.3667	13.0	8099	0.3591	0.6895
0.3674	14.0	8722	0.3526	0.6931
0.3561	15.0	9345	0.3187	0.7292
0.342	16.0	9968	0.3318	0.7004
0.3305	17.0	10591	0.3185	0.7004
0.3269	18.0	11214	0.3733	0.6534
0.3341	19.0	11837	0.3197	0.7040
0.3214	20.0	12460	0.3166	0.7148
0.3109	21.0	13083	0.3257	0.7148
0.3125	22.0	13706	0.3299	0.7292
0.3097	23.0	14329	0.4120	0.6895
0.2918	24.0	14952	0.3158	0.7148
0.2792	25.0	15575	0.3077	0.7256
0.2766	26.0	16198	0.3078	0.7292
0.2811	27.0	16821	0.3033	0.7256
0.2719	28.0	17444	0.3017	0.7148
0.2661	29.0	18067	0.2947	0.7184
0.263	30.0	18690	0.3416	0.7329
0.2633	31.0	19313	0.3170	0.7256
0.2517	32.0	19936	0.3063	0.7220
0.2486	33.0	20559	0.3137	0.7256
0.252	34.0	21182	0.3118	0.7256
0.2396	35.0	21805	0.2980	0.7220
0.2471	36.0	22428	0.3050	0.7329
0.2361	37.0	23051	0.3366	0.7220
0.2358	38.0	23674	0.3080	0.7473
0.2231	39.0	24297	0.3191	0.7437
0.2298	40.0	24920	0.3018	0.7148
0.2241	41.0	25543	0.3090	0.7401
0.2243	42.0	26166	0.3137	0.7401
0.2237	43.0	26789	0.3277	0.7365
0.2147	44.0	27412	0.3116	0.7437
0.2149	45.0	28035	0.3289	0.7365
0.2087	46.0	28658	0.3241	0.7292
0.21	47.0	29281	0.3060	0.7365
0.214	48.0	29904	0.3311	0.7329
0.2108	49.0	30527	0.3144	0.7437
0.2029	50.0	31150	0.3094	0.7401
0.2028	51.0	31773	0.3141	0.7473
0.2018	52.0	32396	0.3188	0.7437
0.2079	53.0	33019	0.3138	0.7365
0.1982	54.0	33642	0.3109	0.7401
0.1926	55.0	34265	0.3118	0.7437
0.1972	56.0	34888	0.3270	0.7401
0.1986	57.0	35511	0.3098	0.7365
0.1928	58.0	36134	0.3131	0.7401
0.1974	59.0	36757	0.3132	0.7401
0.1927	60.0	37380	0.3141	0.7401

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230824043537

20230824043537

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230824043537

Evaluation results