20230826052103

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5758
Accuracy: 0.73

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6259	0.48
No log	2.0	50	0.7321	0.62
No log	3.0	75	0.7953	0.64
No log	4.0	100	0.6993	0.65
No log	5.0	125	0.5882	0.62
No log	6.0	150	0.5896	0.63
No log	7.0	175	0.6143	0.66
No log	8.0	200	0.7070	0.63
No log	9.0	225	0.6441	0.67
No log	10.0	250	0.7048	0.68
No log	11.0	275	0.5610	0.7
No log	12.0	300	0.6845	0.69
No log	13.0	325	0.7743	0.67
No log	14.0	350	0.7745	0.68
No log	15.0	375	0.7992	0.72
No log	16.0	400	0.7166	0.72
No log	17.0	425	0.7013	0.75
No log	18.0	450	0.8815	0.72
No log	19.0	475	0.7997	0.72
0.6923	20.0	500	0.7411	0.7
0.6923	21.0	525	0.7322	0.71
0.6923	22.0	550	0.8924	0.67
0.6923	23.0	575	0.7238	0.7
0.6923	24.0	600	0.7785	0.71
0.6923	25.0	625	0.6886	0.71
0.6923	26.0	650	0.7782	0.72
0.6923	27.0	675	0.7322	0.71
0.6923	28.0	700	0.7590	0.68
0.6923	29.0	725	0.7170	0.71
0.6923	30.0	750	0.7993	0.71
0.6923	31.0	775	0.7465	0.7
0.6923	32.0	800	0.6627	0.7
0.6923	33.0	825	0.7128	0.7
0.6923	34.0	850	0.6699	0.69
0.6923	35.0	875	0.6974	0.69
0.6923	36.0	900	0.6626	0.7
0.6923	37.0	925	0.6843	0.7
0.6923	38.0	950	0.6846	0.71
0.6923	39.0	975	0.7098	0.71
0.2907	40.0	1000	0.6845	0.71
0.2907	41.0	1025	0.6782	0.71
0.2907	42.0	1050	0.6635	0.7
0.2907	43.0	1075	0.5903	0.7
0.2907	44.0	1100	0.6072	0.71
0.2907	45.0	1125	0.5961	0.72
0.2907	46.0	1150	0.6115	0.72
0.2907	47.0	1175	0.6240	0.71
0.2907	48.0	1200	0.6327	0.72
0.2907	49.0	1225	0.6935	0.71
0.2907	50.0	1250	0.5864	0.73
0.2907	51.0	1275	0.5779	0.72
0.2907	52.0	1300	0.6013	0.73
0.2907	53.0	1325	0.5665	0.75
0.2907	54.0	1350	0.5745	0.76
0.2907	55.0	1375	0.6108	0.75
0.2907	56.0	1400	0.5844	0.75
0.2907	57.0	1425	0.5647	0.77
0.2907	58.0	1450	0.5844	0.76
0.2907	59.0	1475	0.5720	0.75
0.2156	60.0	1500	0.5815	0.72
0.2156	61.0	1525	0.5615	0.73
0.2156	62.0	1550	0.5820	0.75
0.2156	63.0	1575	0.5712	0.73
0.2156	64.0	1600	0.5682	0.72
0.2156	65.0	1625	0.6267	0.73
0.2156	66.0	1650	0.5815	0.74
0.2156	67.0	1675	0.6171	0.73
0.2156	68.0	1700	0.5554	0.74
0.2156	69.0	1725	0.6060	0.72
0.2156	70.0	1750	0.5575	0.73
0.2156	71.0	1775	0.5885	0.73
0.2156	72.0	1800	0.5571	0.73
0.2156	73.0	1825	0.5845	0.73
0.2156	74.0	1850	0.5710	0.73
0.2156	75.0	1875	0.5680	0.73
0.2156	76.0	1900	0.5799	0.73
0.2156	77.0	1925	0.5636	0.73
0.2156	78.0	1950	0.5738	0.73
0.2156	79.0	1975	0.5750	0.73
0.194	80.0	2000	0.5758	0.73

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826052103

20230826052103

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826052103

Evaluation results