20230829194700

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.3338
Accuracy: 0.6346

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.003
train_batch_size: 16
eval_batch_size: 8
seed: 44
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	35	0.3465	0.5962
No log	2.0	70	0.3324	0.5769
No log	3.0	105	0.3771	0.6154
No log	4.0	140	0.3892	0.5481
No log	5.0	175	0.3777	0.5481
No log	6.0	210	0.4086	0.4615
No log	7.0	245	0.3702	0.4135
No log	8.0	280	0.3553	0.4519
No log	9.0	315	0.3547	0.4135
No log	10.0	350	0.6069	0.3942
No log	11.0	385	0.3542	0.4712
No log	12.0	420	0.4133	0.625
No log	13.0	455	0.4395	0.6346
No log	14.0	490	0.3549	0.6346
0.45	15.0	525	0.3869	0.4231
0.45	16.0	560	0.3793	0.6346
0.45	17.0	595	0.3991	0.4135
0.45	18.0	630	0.3405	0.6442
0.45	19.0	665	0.3948	0.4038
0.45	20.0	700	0.3327	0.6442
0.45	21.0	735	0.3452	0.6346
0.45	22.0	770	0.3510	0.5962
0.45	23.0	805	0.3443	0.625
0.45	24.0	840	0.3563	0.6346
0.45	25.0	875	0.5409	0.3846
0.45	26.0	910	0.3971	0.4519
0.45	27.0	945	0.7386	0.4038
0.45	28.0	980	0.3423	0.6058
0.4313	29.0	1015	0.3482	0.5096
0.4313	30.0	1050	0.3383	0.5769
0.4313	31.0	1085	0.5153	0.4038
0.4313	32.0	1120	0.6008	0.3654
0.4313	33.0	1155	0.4639	0.6346
0.4313	34.0	1190	0.3641	0.6346
0.4313	35.0	1225	0.3407	0.5577
0.4313	36.0	1260	0.3406	0.5769
0.4313	37.0	1295	0.3353	0.6346
0.4313	38.0	1330	0.3465	0.6346
0.4313	39.0	1365	0.3408	0.6346
0.4313	40.0	1400	0.3325	0.625
0.4313	41.0	1435	0.3983	0.3942
0.4313	42.0	1470	0.3435	0.5577
0.3946	43.0	1505	0.3315	0.6346
0.3946	44.0	1540	0.3454	0.5577
0.3946	45.0	1575	0.3314	0.6346
0.3946	46.0	1610	0.3326	0.6346
0.3946	47.0	1645	0.3506	0.5385
0.3946	48.0	1680	0.3370	0.6154
0.3946	49.0	1715	0.3354	0.6346
0.3946	50.0	1750	0.3302	0.6442
0.3946	51.0	1785	0.3400	0.5865
0.3946	52.0	1820	0.3844	0.4423
0.3946	53.0	1855	0.3378	0.6058
0.3946	54.0	1890	0.3673	0.4327
0.3946	55.0	1925	0.3340	0.6346
0.3946	56.0	1960	0.3464	0.6346
0.3946	57.0	1995	0.3565	0.5192
0.375	58.0	2030	0.3356	0.6346
0.375	59.0	2065	0.4202	0.3942
0.375	60.0	2100	0.3495	0.6442
0.375	61.0	2135	0.3374	0.6346
0.375	62.0	2170	0.3323	0.6635
0.375	63.0	2205	0.3362	0.6731
0.375	64.0	2240	0.3767	0.6346
0.375	65.0	2275	0.3345	0.6346
0.375	66.0	2310	0.3451	0.6346
0.375	67.0	2345	0.3403	0.6058
0.375	68.0	2380	0.3347	0.6538
0.375	69.0	2415	0.3419	0.6346
0.375	70.0	2450	0.3479	0.6346
0.375	71.0	2485	0.3330	0.6346
0.3666	72.0	2520	0.3442	0.5192
0.3666	73.0	2555	0.3335	0.6346
0.3666	74.0	2590	0.3474	0.4615
0.3666	75.0	2625	0.3364	0.6538
0.3666	76.0	2660	0.3368	0.6538
0.3666	77.0	2695	0.3498	0.5192
0.3666	78.0	2730	0.3407	0.5577
0.3666	79.0	2765	0.3352	0.6346
0.3666	80.0	2800	0.3338	0.6346

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230829194700

20230829194700

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230829194700

Evaluation results