20230826052713

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4913
Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6276	0.57
No log	2.0	50	0.6136	0.63
No log	3.0	75	0.6774	0.66
No log	4.0	100	0.5964	0.64
No log	5.0	125	0.5316	0.62
No log	6.0	150	0.5231	0.62
No log	7.0	175	0.5156	0.63
No log	8.0	200	0.6216	0.64
No log	9.0	225	0.5013	0.71
No log	10.0	250	0.5734	0.7
No log	11.0	275	0.4683	0.66
No log	12.0	300	0.5333	0.73
No log	13.0	325	0.6740	0.69
No log	14.0	350	0.5185	0.71
No log	15.0	375	0.5031	0.71
No log	16.0	400	0.5398	0.71
No log	17.0	425	0.5246	0.73
No log	18.0	450	0.7414	0.69
No log	19.0	475	0.6817	0.72
0.7352	20.0	500	0.6656	0.71
0.7352	21.0	525	0.5839	0.76
0.7352	22.0	550	0.6626	0.76
0.7352	23.0	575	0.5017	0.75
0.7352	24.0	600	0.5168	0.74
0.7352	25.0	625	0.5912	0.78
0.7352	26.0	650	0.5596	0.77
0.7352	27.0	675	0.4884	0.77
0.7352	28.0	700	0.4738	0.73
0.7352	29.0	725	0.5052	0.76
0.7352	30.0	750	0.6163	0.74
0.7352	31.0	775	0.5824	0.74
0.7352	32.0	800	0.4995	0.72
0.7352	33.0	825	0.4936	0.71
0.7352	34.0	850	0.5464	0.72
0.7352	35.0	875	0.5164	0.74
0.7352	36.0	900	0.5088	0.75
0.7352	37.0	925	0.5991	0.75
0.7352	38.0	950	0.4963	0.73
0.7352	39.0	975	0.5086	0.72
0.411	40.0	1000	0.5203	0.73
0.411	41.0	1025	0.5844	0.74
0.411	42.0	1050	0.5285	0.74
0.411	43.0	1075	0.5553	0.74
0.411	44.0	1100	0.5588	0.71
0.411	45.0	1125	0.5392	0.72
0.411	46.0	1150	0.5494	0.72
0.411	47.0	1175	0.4982	0.76
0.411	48.0	1200	0.5374	0.72
0.411	49.0	1225	0.5730	0.73
0.411	50.0	1250	0.5149	0.72
0.411	51.0	1275	0.4949	0.72
0.411	52.0	1300	0.5295	0.73
0.411	53.0	1325	0.5223	0.72
0.411	54.0	1350	0.5617	0.71
0.411	55.0	1375	0.5373	0.72
0.411	56.0	1400	0.4857	0.73
0.411	57.0	1425	0.4954	0.72
0.411	58.0	1450	0.5024	0.72
0.411	59.0	1475	0.4971	0.74
0.318	60.0	1500	0.5265	0.73
0.318	61.0	1525	0.4967	0.71
0.318	62.0	1550	0.4972	0.73
0.318	63.0	1575	0.4908	0.72
0.318	64.0	1600	0.5056	0.74
0.318	65.0	1625	0.5231	0.74
0.318	66.0	1650	0.4737	0.75
0.318	67.0	1675	0.5016	0.72
0.318	68.0	1700	0.4988	0.73
0.318	69.0	1725	0.5276	0.74
0.318	70.0	1750	0.4912	0.73
0.318	71.0	1775	0.4865	0.72
0.318	72.0	1800	0.4754	0.73
0.318	73.0	1825	0.4922	0.73
0.318	74.0	1850	0.4884	0.74
0.318	75.0	1875	0.4868	0.73
0.318	76.0	1900	0.4872	0.73
0.318	77.0	1925	0.4848	0.72
0.318	78.0	1950	0.4923	0.72
0.318	79.0	1975	0.4888	0.73
0.287	80.0	2000	0.4913	0.72

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826052713

20230826052713

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826052713

Evaluation results