20230903015507

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.8747
Accuracy: 0.6505

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6715	0.5172
0.6923	2.0	680	0.6802	0.5
0.6863	3.0	1020	0.6721	0.5
0.6863	4.0	1360	0.7046	0.5
0.6843	5.0	1700	0.6757	0.5
0.6885	6.0	2040	0.6788	0.5
0.6885	7.0	2380	0.6702	0.5
0.686	8.0	2720	0.6763	0.5
0.6858	9.0	3060	0.6777	0.5
0.6858	10.0	3400	0.6804	0.5
0.6868	11.0	3740	0.6711	0.5
0.6817	12.0	4080	0.6777	0.5
0.6817	13.0	4420	0.6960	0.5
0.6805	14.0	4760	0.6901	0.5
0.6823	15.0	5100	0.6715	0.5
0.6823	16.0	5440	0.6738	0.5016
0.6776	17.0	5780	0.6813	0.5
0.676	18.0	6120	0.6718	0.5
0.676	19.0	6460	0.6727	0.5
0.6762	20.0	6800	0.6742	0.4984
0.6748	21.0	7140	0.6699	0.5282
0.6748	22.0	7480	0.6624	0.5141
0.6749	23.0	7820	0.7549	0.5705
0.6441	24.0	8160	0.6447	0.6238
0.6189	25.0	8500	0.6692	0.6113
0.6189	26.0	8840	0.6171	0.6771
0.582	27.0	9180	0.7757	0.5831
0.5622	28.0	9520	0.8074	0.6050
0.5622	29.0	9860	0.6636	0.6614
0.5303	30.0	10200	0.7353	0.6458
0.5188	31.0	10540	0.6546	0.6536
0.5188	32.0	10880	0.8451	0.6082
0.5007	33.0	11220	0.7618	0.6442
0.4847	34.0	11560	0.6832	0.6583
0.4847	35.0	11900	0.7070	0.6442
0.4719	36.0	12240	0.6991	0.6536
0.4523	37.0	12580	0.7525	0.6661
0.4523	38.0	12920	0.7912	0.6348
0.4447	39.0	13260	0.7760	0.6536
0.439	40.0	13600	0.8018	0.6458
0.439	41.0	13940	0.7104	0.6708
0.4248	42.0	14280	0.7607	0.6599
0.4063	43.0	14620	0.6979	0.6803
0.4063	44.0	14960	0.7796	0.6614
0.4123	45.0	15300	0.7394	0.6708
0.3984	46.0	15640	0.7791	0.6599
0.3984	47.0	15980	0.7433	0.6614
0.3871	48.0	16320	0.7870	0.6442
0.3787	49.0	16660	0.7256	0.6755
0.3884	50.0	17000	0.8035	0.6536
0.3884	51.0	17340	0.7809	0.6489
0.373	52.0	17680	0.7920	0.6567
0.3704	53.0	18020	0.8107	0.6661
0.3704	54.0	18360	0.8759	0.6113
0.3628	55.0	18700	0.8727	0.6332
0.3518	56.0	19040	0.8756	0.6254
0.3518	57.0	19380	0.8555	0.6317
0.3536	58.0	19720	0.8082	0.6254
0.3504	59.0	20060	0.7880	0.6614
0.3504	60.0	20400	0.9100	0.6301
0.3466	61.0	20740	0.8614	0.6207
0.3425	62.0	21080	0.8712	0.6301
0.3425	63.0	21420	0.8285	0.6614
0.339	64.0	21760	0.9010	0.6599
0.3339	65.0	22100	0.9055	0.6426
0.3339	66.0	22440	0.8365	0.6646
0.3294	67.0	22780	0.8333	0.6505
0.3365	68.0	23120	0.8414	0.6426
0.3365	69.0	23460	0.8855	0.6395
0.332	70.0	23800	0.9028	0.6364
0.3171	71.0	24140	0.8584	0.6364
0.3171	72.0	24480	0.8482	0.6536
0.3204	73.0	24820	0.8713	0.6426
0.3289	74.0	25160	0.8881	0.6473
0.3139	75.0	25500	0.8588	0.6473
0.3139	76.0	25840	0.8772	0.6473
0.3159	77.0	26180	0.9019	0.6536
0.306	78.0	26520	0.8819	0.6505
0.306	79.0	26860	0.8837	0.6473
0.3091	80.0	27200	0.8747	0.6505

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230903015507

20230903015507

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230903015507

Evaluation results