20230826130711

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2867
Accuracy: 0.62

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.2952	0.64
No log	2.0	50	0.2895	0.57
No log	3.0	75	0.2922	0.61
No log	4.0	100	0.2938	0.64
No log	5.0	125	0.2885	0.63
No log	6.0	150	0.2945	0.48
No log	7.0	175	0.2860	0.67
No log	8.0	200	0.2888	0.66
No log	9.0	225	0.2894	0.51
No log	10.0	250	0.2903	0.56
No log	11.0	275	0.2868	0.66
No log	12.0	300	0.2880	0.66
No log	13.0	325	0.2947	0.54
No log	14.0	350	0.2957	0.64
No log	15.0	375	0.2877	0.66
No log	16.0	400	0.2865	0.68
No log	17.0	425	0.2850	0.69
No log	18.0	450	0.2846	0.66
No log	19.0	475	0.2911	0.59
0.4684	20.0	500	0.2961	0.64
0.4684	21.0	525	0.2872	0.63
0.4684	22.0	550	0.2880	0.64
0.4684	23.0	575	0.2951	0.51
0.4684	24.0	600	0.2897	0.64
0.4684	25.0	625	0.2884	0.64
0.4684	26.0	650	0.2895	0.64
0.4684	27.0	675	0.2872	0.61
0.4684	28.0	700	0.2890	0.64
0.4684	29.0	725	0.2887	0.66
0.4684	30.0	750	0.2886	0.63
0.4684	31.0	775	0.2875	0.6
0.4684	32.0	800	0.2882	0.65
0.4684	33.0	825	0.2886	0.58
0.4684	34.0	850	0.2970	0.64
0.4684	35.0	875	0.2875	0.59
0.4684	36.0	900	0.2888	0.63
0.4684	37.0	925	0.2868	0.63
0.4684	38.0	950	0.2863	0.64
0.4684	39.0	975	0.2911	0.63
0.4634	40.0	1000	0.2867	0.63
0.4634	41.0	1025	0.2936	0.54
0.4634	42.0	1050	0.2965	0.6
0.4634	43.0	1075	0.2872	0.62
0.4634	44.0	1100	0.2862	0.65
0.4634	45.0	1125	0.2871	0.65
0.4634	46.0	1150	0.2914	0.63
0.4634	47.0	1175	0.2925	0.64
0.4634	48.0	1200	0.2883	0.64
0.4634	49.0	1225	0.2896	0.65
0.4634	50.0	1250	0.2866	0.64
0.4634	51.0	1275	0.2857	0.64
0.4634	52.0	1300	0.2892	0.64
0.4634	53.0	1325	0.2861	0.65
0.4634	54.0	1350	0.2861	0.63
0.4634	55.0	1375	0.2872	0.65
0.4634	56.0	1400	0.2861	0.64
0.4634	57.0	1425	0.2865	0.65
0.4634	58.0	1450	0.2880	0.63
0.4634	59.0	1475	0.2898	0.63
0.4583	60.0	1500	0.2900	0.63
0.4583	61.0	1525	0.2896	0.64
0.4583	62.0	1550	0.2886	0.63
0.4583	63.0	1575	0.2888	0.63
0.4583	64.0	1600	0.2891	0.64
0.4583	65.0	1625	0.2874	0.63
0.4583	66.0	1650	0.2875	0.62
0.4583	67.0	1675	0.2882	0.62
0.4583	68.0	1700	0.2863	0.62
0.4583	69.0	1725	0.2867	0.63
0.4583	70.0	1750	0.2865	0.64
0.4583	71.0	1775	0.2863	0.64
0.4583	72.0	1800	0.2862	0.64
0.4583	73.0	1825	0.2864	0.64
0.4583	74.0	1850	0.2862	0.64
0.4583	75.0	1875	0.2866	0.64
0.4583	76.0	1900	0.2868	0.63
0.4583	77.0	1925	0.2866	0.63
0.4583	78.0	1950	0.2867	0.63
0.4583	79.0	1975	0.2867	0.62
0.4597	80.0	2000	0.2867	0.62

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826130711

20230826130711

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826130711

Evaluation results