20230826123019

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.5900
Accuracy: 0.65

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	25	0.6011	0.66
No log	2.0	50	0.5991	0.65
No log	3.0	75	0.5983	0.65
No log	4.0	100	0.6063	0.65
No log	5.0	125	0.5973	0.65
No log	6.0	150	0.6049	0.65
No log	7.0	175	0.6031	0.65
No log	8.0	200	0.6001	0.65
No log	9.0	225	0.5969	0.64
No log	10.0	250	0.6007	0.65
No log	11.0	275	0.6016	0.65
No log	12.0	300	0.5992	0.65
No log	13.0	325	0.5968	0.65
No log	14.0	350	0.5968	0.65
No log	15.0	375	0.6000	0.65
No log	16.0	400	0.6000	0.65
No log	17.0	425	0.5883	0.66
No log	18.0	450	0.5920	0.65
No log	19.0	475	0.6035	0.62
0.6519	20.0	500	0.6075	0.64
0.6519	21.0	525	0.5919	0.65
0.6519	22.0	550	0.5951	0.63
0.6519	23.0	575	0.6037	0.61
0.6519	24.0	600	0.6058	0.62
0.6519	25.0	625	0.5944	0.65
0.6519	26.0	650	0.5938	0.65
0.6519	27.0	675	0.5909	0.66
0.6519	28.0	700	0.5914	0.65
0.6519	29.0	725	0.5902	0.66
0.6519	30.0	750	0.5906	0.66
0.6519	31.0	775	0.5936	0.65
0.6519	32.0	800	0.5960	0.66
0.6519	33.0	825	0.5953	0.65
0.6519	34.0	850	0.5970	0.65
0.6519	35.0	875	0.5937	0.65
0.6519	36.0	900	0.5954	0.64
0.6519	37.0	925	0.5993	0.63
0.6519	38.0	950	0.5905	0.65
0.6519	39.0	975	0.5898	0.65
0.6395	40.0	1000	0.5947	0.65
0.6395	41.0	1025	0.5966	0.64
0.6395	42.0	1050	0.5953	0.65
0.6395	43.0	1075	0.5968	0.64
0.6395	44.0	1100	0.5934	0.65
0.6395	45.0	1125	0.5948	0.66
0.6395	46.0	1150	0.5958	0.65
0.6395	47.0	1175	0.5928	0.65
0.6395	48.0	1200	0.5922	0.65
0.6395	49.0	1225	0.5929	0.65
0.6395	50.0	1250	0.5967	0.64
0.6395	51.0	1275	0.5908	0.65
0.6395	52.0	1300	0.5930	0.66
0.6395	53.0	1325	0.5910	0.65
0.6395	54.0	1350	0.5931	0.65
0.6395	55.0	1375	0.5900	0.66
0.6395	56.0	1400	0.5925	0.65
0.6395	57.0	1425	0.5938	0.66
0.6395	58.0	1450	0.5963	0.65
0.6395	59.0	1475	0.5955	0.64
0.6331	60.0	1500	0.5935	0.65
0.6331	61.0	1525	0.5937	0.66
0.6331	62.0	1550	0.5924	0.65
0.6331	63.0	1575	0.5909	0.65
0.6331	64.0	1600	0.5891	0.65
0.6331	65.0	1625	0.5881	0.65
0.6331	66.0	1650	0.5884	0.65
0.6331	67.0	1675	0.5893	0.65
0.6331	68.0	1700	0.5900	0.65
0.6331	69.0	1725	0.5908	0.65
0.6331	70.0	1750	0.5912	0.65
0.6331	71.0	1775	0.5914	0.65
0.6331	72.0	1800	0.5901	0.65
0.6331	73.0	1825	0.5898	0.65
0.6331	74.0	1850	0.5896	0.65
0.6331	75.0	1875	0.5905	0.65
0.6331	76.0	1900	0.5901	0.65
0.6331	77.0	1925	0.5901	0.65
0.6331	78.0	1950	0.5900	0.65
0.6331	79.0	1975	0.5900	0.65
0.6276	80.0	2000	0.5900	0.65

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230826123019

20230826123019

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230826123019

Evaluation results