20230826022810

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.2767
Accuracy: 0.73

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.4473	0.5
No log	2.0	50	0.3750	0.6
No log	3.0	75	0.3427	0.63
No log	4.0	100	0.2967	0.63
No log	5.0	125	0.2981	0.57
No log	6.0	150	0.3264	0.56
No log	7.0	175	0.2918	0.58
No log	8.0	200	0.3062	0.66
No log	9.0	225	0.2885	0.58
No log	10.0	250	0.2884	0.6
No log	11.0	275	0.2963	0.55
No log	12.0	300	0.2895	0.6
No log	13.0	325	0.2873	0.6
No log	14.0	350	0.2884	0.58
No log	15.0	375	0.2871	0.59
No log	16.0	400	0.2859	0.6
No log	17.0	425	0.2912	0.53
No log	18.0	450	0.2841	0.61
No log	19.0	475	0.2834	0.61
0.5493	20.0	500	0.2825	0.64
0.5493	21.0	525	0.2847	0.62
0.5493	22.0	550	0.2782	0.62
0.5493	23.0	575	0.2759	0.62
0.5493	24.0	600	0.2750	0.67
0.5493	25.0	625	0.2745	0.69
0.5493	26.0	650	0.2721	0.66
0.5493	27.0	675	0.2728	0.65
0.5493	28.0	700	0.2848	0.69
0.5493	29.0	725	0.2727	0.65
0.5493	30.0	750	0.2739	0.66
0.5493	31.0	775	0.2715	0.66
0.5493	32.0	800	0.2950	0.67
0.5493	33.0	825	0.2764	0.68
0.5493	34.0	850	0.2693	0.68
0.5493	35.0	875	0.2686	0.69
0.5493	36.0	900	0.2793	0.66
0.5493	37.0	925	0.2700	0.68
0.5493	38.0	950	0.2744	0.68
0.5493	39.0	975	0.2789	0.71
0.4987	40.0	1000	0.2757	0.7
0.4987	41.0	1025	0.2705	0.69
0.4987	42.0	1050	0.2836	0.7
0.4987	43.0	1075	0.2808	0.6
0.4987	44.0	1100	0.2734	0.71
0.4987	45.0	1125	0.2703	0.69
0.4987	46.0	1150	0.2787	0.72
0.4987	47.0	1175	0.2684	0.69
0.4987	48.0	1200	0.2737	0.7
0.4987	49.0	1225	0.2792	0.72
0.4987	50.0	1250	0.2737	0.71
0.4987	51.0	1275	0.2723	0.71
0.4987	52.0	1300	0.2725	0.73
0.4987	53.0	1325	0.2722	0.71
0.4987	54.0	1350	0.2800	0.7
0.4987	55.0	1375	0.2769	0.71
0.4987	56.0	1400	0.2772	0.76
0.4987	57.0	1425	0.2715	0.77
0.4987	58.0	1450	0.2794	0.75
0.4987	59.0	1475	0.2771	0.73
0.447	60.0	1500	0.2798	0.7
0.447	61.0	1525	0.2717	0.74
0.447	62.0	1550	0.2991	0.71
0.447	63.0	1575	0.2719	0.72
0.447	64.0	1600	0.2762	0.72
0.447	65.0	1625	0.2833	0.73
0.447	66.0	1650	0.2772	0.74
0.447	67.0	1675	0.2807	0.71
0.447	68.0	1700	0.2741	0.73
0.447	69.0	1725	0.2765	0.72
0.447	70.0	1750	0.2786	0.73
0.447	71.0	1775	0.2795	0.73
0.447	72.0	1800	0.2752	0.74
0.447	73.0	1825	0.2838	0.71
0.447	74.0	1850	0.2763	0.74
0.447	75.0	1875	0.2764	0.73
0.447	76.0	1900	0.2756	0.72
0.447	77.0	1925	0.2738	0.74
0.447	78.0	1950	0.2743	0.74
0.447	79.0	1975	0.2779	0.72
0.4199	80.0	2000	0.2767	0.73

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826022810

20230826022810

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826022810

Evaluation results