20230826035341

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6425
Accuracy: 0.66

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.7174	0.41
No log	2.0	50	0.7533	0.48
No log	3.0	75	0.7732	0.61
No log	4.0	100	0.6638	0.61
No log	5.0	125	0.6407	0.49
No log	6.0	150	0.7758	0.37
No log	7.0	175	0.5933	0.67
No log	8.0	200	0.6227	0.67
No log	9.0	225	0.6462	0.66
No log	10.0	250	0.6520	0.65
No log	11.0	275	0.5907	0.64
No log	12.0	300	0.6254	0.64
No log	13.0	325	0.6457	0.64
No log	14.0	350	0.5731	0.65
No log	15.0	375	0.6088	0.65
No log	16.0	400	0.5722	0.65
No log	17.0	425	0.6187	0.64
No log	18.0	450	0.6932	0.65
No log	19.0	475	0.6068	0.66
0.7336	20.0	500	0.5740	0.68
0.7336	21.0	525	0.5791	0.66
0.7336	22.0	550	0.7415	0.65
0.7336	23.0	575	0.6275	0.64
0.7336	24.0	600	0.6515	0.65
0.7336	25.0	625	0.6619	0.66
0.7336	26.0	650	0.7296	0.63
0.7336	27.0	675	0.6984	0.65
0.7336	28.0	700	0.7813	0.68
0.7336	29.0	725	0.7499	0.68
0.7336	30.0	750	0.8273	0.66
0.7336	31.0	775	0.7841	0.67
0.7336	32.0	800	0.7399	0.67
0.7336	33.0	825	0.6789	0.67
0.7336	34.0	850	0.7219	0.68
0.7336	35.0	875	0.7323	0.68
0.7336	36.0	900	0.7056	0.69
0.7336	37.0	925	0.6669	0.68
0.7336	38.0	950	0.6746	0.67
0.7336	39.0	975	0.6932	0.69
0.371	40.0	1000	0.6695	0.68
0.371	41.0	1025	0.7091	0.68
0.371	42.0	1050	0.6842	0.65
0.371	43.0	1075	0.6724	0.66
0.371	44.0	1100	0.6938	0.67
0.371	45.0	1125	0.6779	0.67
0.371	46.0	1150	0.6894	0.67
0.371	47.0	1175	0.6746	0.65
0.371	48.0	1200	0.7162	0.67
0.371	49.0	1225	0.6892	0.66
0.371	50.0	1250	0.6888	0.64
0.371	51.0	1275	0.6493	0.67
0.371	52.0	1300	0.6620	0.66
0.371	53.0	1325	0.6613	0.65
0.371	54.0	1350	0.6567	0.66
0.371	55.0	1375	0.6890	0.67
0.371	56.0	1400	0.6884	0.67
0.371	57.0	1425	0.6547	0.66
0.371	58.0	1450	0.6831	0.66
0.371	59.0	1475	0.6529	0.66
0.2458	60.0	1500	0.6793	0.67
0.2458	61.0	1525	0.6769	0.67
0.2458	62.0	1550	0.6766	0.67
0.2458	63.0	1575	0.6511	0.66
0.2458	64.0	1600	0.6574	0.67
0.2458	65.0	1625	0.6445	0.66
0.2458	66.0	1650	0.6468	0.67
0.2458	67.0	1675	0.6413	0.66
0.2458	68.0	1700	0.6591	0.67
0.2458	69.0	1725	0.6374	0.67
0.2458	70.0	1750	0.6688	0.66
0.2458	71.0	1775	0.6512	0.66
0.2458	72.0	1800	0.6465	0.66
0.2458	73.0	1825	0.6602	0.66
0.2458	74.0	1850	0.6482	0.66
0.2458	75.0	1875	0.6434	0.66
0.2458	76.0	1900	0.6523	0.66
0.2458	77.0	1925	0.6502	0.66
0.2458	78.0	1950	0.6447	0.66
0.2458	79.0	1975	0.6427	0.66
0.2218	80.0	2000	0.6425	0.66

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826035341

20230826035341

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826035341

Evaluation results