20230826051154

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2897
Accuracy: 0.7

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.02
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.4805	0.57
No log	2.0	50	0.3152	0.59
No log	3.0	75	0.3020	0.62
No log	4.0	100	0.2893	0.59
No log	5.0	125	0.2988	0.4
No log	6.0	150	0.2916	0.57
No log	7.0	175	0.2947	0.62
No log	8.0	200	0.2888	0.61
No log	9.0	225	0.2915	0.53
No log	10.0	250	0.2938	0.63
No log	11.0	275	0.2985	0.36
No log	12.0	300	0.2854	0.65
No log	13.0	325	0.2870	0.49
No log	14.0	350	0.2802	0.64
No log	15.0	375	0.2801	0.61
No log	16.0	400	0.2806	0.63
No log	17.0	425	0.2810	0.6
No log	18.0	450	0.2888	0.66
No log	19.0	475	0.2780	0.63
0.6923	20.0	500	0.2803	0.6
0.6923	21.0	525	0.2768	0.65
0.6923	22.0	550	0.2744	0.65
0.6923	23.0	575	0.2831	0.66
0.6923	24.0	600	0.2743	0.67
0.6923	25.0	625	0.2847	0.69
0.6923	26.0	650	0.2737	0.71
0.6923	27.0	675	0.2817	0.65
0.6923	28.0	700	0.2770	0.68
0.6923	29.0	725	0.2887	0.67
0.6923	30.0	750	0.2780	0.64
0.6923	31.0	775	0.2707	0.66
0.6923	32.0	800	0.2889	0.7
0.6923	33.0	825	0.2821	0.68
0.6923	34.0	850	0.2735	0.7
0.6923	35.0	875	0.2772	0.66
0.6923	36.0	900	0.2766	0.67
0.6923	37.0	925	0.2862	0.68
0.6923	38.0	950	0.2745	0.65
0.6923	39.0	975	0.2828	0.66
0.5864	40.0	1000	0.3264	0.68
0.5864	41.0	1025	0.2750	0.68
0.5864	42.0	1050	0.2831	0.67
0.5864	43.0	1075	0.2725	0.67
0.5864	44.0	1100	0.2909	0.68
0.5864	45.0	1125	0.2841	0.69
0.5864	46.0	1150	0.3126	0.69
0.5864	47.0	1175	0.2892	0.72
0.5864	48.0	1200	0.2887	0.7
0.5864	49.0	1225	0.2834	0.7
0.5864	50.0	1250	0.2731	0.66
0.5864	51.0	1275	0.2888	0.68
0.5864	52.0	1300	0.3080	0.67
0.5864	53.0	1325	0.2862	0.67
0.5864	54.0	1350	0.2772	0.67
0.5864	55.0	1375	0.2791	0.67
0.5864	56.0	1400	0.2930	0.68
0.5864	57.0	1425	0.2783	0.66
0.5864	58.0	1450	0.2855	0.67
0.5864	59.0	1475	0.2850	0.69
0.4926	60.0	1500	0.2899	0.69
0.4926	61.0	1525	0.2797	0.67
0.4926	62.0	1550	0.3322	0.69
0.4926	63.0	1575	0.2762	0.69
0.4926	64.0	1600	0.2816	0.7
0.4926	65.0	1625	0.2952	0.68
0.4926	66.0	1650	0.2794	0.68
0.4926	67.0	1675	0.2873	0.69
0.4926	68.0	1700	0.2835	0.69
0.4926	69.0	1725	0.2908	0.68
0.4926	70.0	1750	0.3008	0.68
0.4926	71.0	1775	0.2893	0.68
0.4926	72.0	1800	0.2826	0.68
0.4926	73.0	1825	0.2919	0.68
0.4926	74.0	1850	0.2832	0.7
0.4926	75.0	1875	0.2830	0.7
0.4926	76.0	1900	0.2809	0.69
0.4926	77.0	1925	0.2822	0.69
0.4926	78.0	1950	0.2884	0.69
0.4926	79.0	1975	0.2910	0.7
0.4369	80.0	2000	0.2897	0.7

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826051154

20230826051154

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826051154

Evaluation results