20230826022800

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.4898
Accuracy: 0.75

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.5778	0.38
No log	2.0	50	0.5810	0.66
No log	3.0	75	0.6271	0.65
No log	4.0	100	0.5772	0.64
No log	5.0	125	0.5290	0.62
No log	6.0	150	0.5352	0.62
No log	7.0	175	0.5322	0.61
No log	8.0	200	0.5976	0.64
No log	9.0	225	0.5290	0.61
No log	10.0	250	0.5700	0.66
No log	11.0	275	0.5132	0.66
No log	12.0	300	0.5155	0.64
No log	13.0	325	0.5049	0.67
No log	14.0	350	0.5078	0.67
No log	15.0	375	0.4821	0.68
No log	16.0	400	0.5371	0.7
No log	17.0	425	0.5407	0.69
No log	18.0	450	0.6441	0.71
No log	19.0	475	0.5787	0.7
0.6402	20.0	500	0.5646	0.68
0.6402	21.0	525	0.5553	0.71
0.6402	22.0	550	0.6137	0.72
0.6402	23.0	575	0.4948	0.71
0.6402	24.0	600	0.5510	0.72
0.6402	25.0	625	0.5985	0.7
0.6402	26.0	650	0.5660	0.71
0.6402	27.0	675	0.5232	0.71
0.6402	28.0	700	0.5381	0.71
0.6402	29.0	725	0.5234	0.71
0.6402	30.0	750	0.6145	0.71
0.6402	31.0	775	0.5482	0.73
0.6402	32.0	800	0.5246	0.72
0.6402	33.0	825	0.5258	0.71
0.6402	34.0	850	0.5278	0.72
0.6402	35.0	875	0.5245	0.72
0.6402	36.0	900	0.5073	0.72
0.6402	37.0	925	0.4983	0.72
0.6402	38.0	950	0.5077	0.73
0.6402	39.0	975	0.5263	0.73
0.3719	40.0	1000	0.5096	0.73
0.3719	41.0	1025	0.5339	0.73
0.3719	42.0	1050	0.4964	0.75
0.3719	43.0	1075	0.4832	0.73
0.3719	44.0	1100	0.4940	0.73
0.3719	45.0	1125	0.4982	0.72
0.3719	46.0	1150	0.5449	0.73
0.3719	47.0	1175	0.5175	0.73
0.3719	48.0	1200	0.5208	0.74
0.3719	49.0	1225	0.5281	0.74
0.3719	50.0	1250	0.4940	0.76
0.3719	51.0	1275	0.5020	0.74
0.3719	52.0	1300	0.5010	0.74
0.3719	53.0	1325	0.4799	0.73
0.3719	54.0	1350	0.5206	0.74
0.3719	55.0	1375	0.5148	0.75
0.3719	56.0	1400	0.4815	0.74
0.3719	57.0	1425	0.4951	0.74
0.3719	58.0	1450	0.5077	0.74
0.3719	59.0	1475	0.5000	0.74
0.3121	60.0	1500	0.5124	0.75
0.3121	61.0	1525	0.4891	0.76
0.3121	62.0	1550	0.4994	0.75
0.3121	63.0	1575	0.4947	0.75
0.3121	64.0	1600	0.4833	0.74
0.3121	65.0	1625	0.5135	0.75
0.3121	66.0	1650	0.4803	0.75
0.3121	67.0	1675	0.5058	0.75
0.3121	68.0	1700	0.4840	0.75
0.3121	69.0	1725	0.5051	0.75
0.3121	70.0	1750	0.4883	0.74
0.3121	71.0	1775	0.4972	0.74
0.3121	72.0	1800	0.4789	0.74
0.3121	73.0	1825	0.4984	0.74
0.3121	74.0	1850	0.4913	0.74
0.3121	75.0	1875	0.4879	0.74
0.3121	76.0	1900	0.4902	0.74
0.3121	77.0	1925	0.4856	0.74
0.3121	78.0	1950	0.4893	0.74
0.3121	79.0	1975	0.4907	0.75
0.2906	80.0	2000	0.4898	0.75

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826022800

20230826022800

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826022800

Evaluation results