20230826073557

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4014
Accuracy: 0.72

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.4958	0.46
No log	2.0	50	0.5956	0.54
No log	3.0	75	0.5377	0.45
No log	4.0	100	0.4202	0.61
No log	5.0	125	0.4367	0.44
No log	6.0	150	0.4370	0.51
No log	7.0	175	0.4207	0.66
No log	8.0	200	0.4423	0.58
No log	9.0	225	0.4107	0.61
No log	10.0	250	0.4332	0.64
No log	11.0	275	0.4055	0.6
No log	12.0	300	0.4376	0.63
No log	13.0	325	0.4062	0.57
No log	14.0	350	0.4000	0.61
No log	15.0	375	0.4052	0.63
No log	16.0	400	0.3961	0.68
No log	17.0	425	0.3976	0.67
No log	18.0	450	0.4186	0.65
No log	19.0	475	0.4304	0.63
0.731	20.0	500	0.4358	0.69
0.731	21.0	525	0.4135	0.68
0.731	22.0	550	0.4180	0.68
0.731	23.0	575	0.4627	0.66
0.731	24.0	600	0.4150	0.65
0.731	25.0	625	0.4005	0.67
0.731	26.0	650	0.4123	0.7
0.731	27.0	675	0.4342	0.69
0.731	28.0	700	0.4551	0.67
0.731	29.0	725	0.4222	0.69
0.731	30.0	750	0.4226	0.71
0.731	31.0	775	0.4702	0.69
0.731	32.0	800	0.4100	0.7
0.731	33.0	825	0.4318	0.69
0.731	34.0	850	0.4447	0.71
0.731	35.0	875	0.3881	0.72
0.731	36.0	900	0.4234	0.69
0.731	37.0	925	0.4869	0.69
0.731	38.0	950	0.4352	0.71
0.731	39.0	975	0.4465	0.71
0.5086	40.0	1000	0.4135	0.7
0.5086	41.0	1025	0.4061	0.7
0.5086	42.0	1050	0.4437	0.72
0.5086	43.0	1075	0.4461	0.72
0.5086	44.0	1100	0.4144	0.69
0.5086	45.0	1125	0.3973	0.71
0.5086	46.0	1150	0.4511	0.73
0.5086	47.0	1175	0.4273	0.71
0.5086	48.0	1200	0.4100	0.71
0.5086	49.0	1225	0.4209	0.72
0.5086	50.0	1250	0.4191	0.74
0.5086	51.0	1275	0.4023	0.74
0.5086	52.0	1300	0.4038	0.72
0.5086	53.0	1325	0.4148	0.73
0.5086	54.0	1350	0.4263	0.72
0.5086	55.0	1375	0.4331	0.73
0.5086	56.0	1400	0.4373	0.71
0.5086	57.0	1425	0.4081	0.72
0.5086	58.0	1450	0.4078	0.71
0.5086	59.0	1475	0.4250	0.72
0.4268	60.0	1500	0.4224	0.7
0.4268	61.0	1525	0.4255	0.7
0.4268	62.0	1550	0.4114	0.72
0.4268	63.0	1575	0.4266	0.72
0.4268	64.0	1600	0.4097	0.72
0.4268	65.0	1625	0.4053	0.72
0.4268	66.0	1650	0.4051	0.71
0.4268	67.0	1675	0.4135	0.73
0.4268	68.0	1700	0.3959	0.74
0.4268	69.0	1725	0.4162	0.72
0.4268	70.0	1750	0.4061	0.73
0.4268	71.0	1775	0.4016	0.71
0.4268	72.0	1800	0.4194	0.71
0.4268	73.0	1825	0.4098	0.72
0.4268	74.0	1850	0.4179	0.71
0.4268	75.0	1875	0.4105	0.71
0.4268	76.0	1900	0.4140	0.72
0.4268	77.0	1925	0.4081	0.73
0.4268	78.0	1950	0.4044	0.73
0.4268	79.0	1975	0.3996	0.72
0.3915	80.0	2000	0.4014	0.72

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826073557

20230826073557

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826073557

Evaluation results