20230830102630

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.6751
Accuracy: 0.4984

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.6962	0.5
0.7003	2.0	680	0.6657	0.5345
0.6959	3.0	1020	0.6703	0.5
0.6959	4.0	1360	0.6845	0.5
0.6978	5.0	1700	0.6767	0.5
0.6854	6.0	2040	0.6876	0.5
0.6854	7.0	2380	0.6705	0.5
0.6851	8.0	2720	0.6806	0.5
0.6845	9.0	3060	0.6881	0.5
0.6845	10.0	3400	0.6737	0.5
0.6835	11.0	3740	0.6734	0.5
0.6821	12.0	4080	0.7058	0.5
0.6821	13.0	4420	0.7057	0.5
0.682	14.0	4760	0.7057	0.5
0.6827	15.0	5100	0.6771	0.5
0.6827	16.0	5440	0.6848	0.5
0.6803	17.0	5780	0.7044	0.5
0.6821	18.0	6120	0.6720	0.4984
0.6821	19.0	6460	0.6716	0.5
0.6784	20.0	6800	0.6855	0.5
0.6821	21.0	7140	0.6705	0.5
0.6821	22.0	7480	0.6753	0.5
0.6888	23.0	7820	0.6745	0.4953
0.6821	24.0	8160	0.6716	0.5
0.682	25.0	8500	0.6702	0.5
0.682	26.0	8840	0.6791	0.5
0.6829	27.0	9180	0.6771	0.5
0.6807	28.0	9520	0.6719	0.5
0.6807	29.0	9860	0.6739	0.5
0.6783	30.0	10200	0.6716	0.5
0.6789	31.0	10540	0.6706	0.5
0.6789	32.0	10880	0.7163	0.5
0.6798	33.0	11220	0.6703	0.5
0.6785	34.0	11560	0.6822	0.5
0.6785	35.0	11900	0.6715	0.5
0.6783	36.0	12240	0.6720	0.5
0.6781	37.0	12580	0.6733	0.5
0.6781	38.0	12920	0.6707	0.5
0.6798	39.0	13260	0.6950	0.5
0.6755	40.0	13600	0.6705	0.5
0.6755	41.0	13940	0.6715	0.5
0.6776	42.0	14280	0.6704	0.5
0.6772	43.0	14620	0.6789	0.5
0.6772	44.0	14960	0.6707	0.5
0.6755	45.0	15300	0.6925	0.5
0.6748	46.0	15640	0.6727	0.5
0.6748	47.0	15980	0.6801	0.5
0.6754	48.0	16320	0.6714	0.5
0.6762	49.0	16660	0.6882	0.5
0.6753	50.0	17000	0.6710	0.5
0.6753	51.0	17340	0.6707	0.5
0.6734	52.0	17680	0.6726	0.5063
0.678	53.0	18020	0.6727	0.5
0.678	54.0	18360	0.6751	0.5
0.6719	55.0	18700	0.6712	0.5
0.6726	56.0	19040	0.6721	0.5
0.6726	57.0	19380	0.6715	0.5
0.6732	58.0	19720	0.6717	0.5016
0.6736	59.0	20060	0.6819	0.5
0.6736	60.0	20400	0.6728	0.5141
0.6732	61.0	20740	0.6716	0.5016
0.6727	62.0	21080	0.6747	0.5
0.6727	63.0	21420	0.6715	0.4984
0.6726	64.0	21760	0.6737	0.5
0.6721	65.0	22100	0.6724	0.5
0.6721	66.0	22440	0.6744	0.5
0.6711	67.0	22780	0.6720	0.5
0.6725	68.0	23120	0.6722	0.4984
0.6725	69.0	23460	0.6722	0.4984
0.6713	70.0	23800	0.6722	0.4984
0.6708	71.0	24140	0.6743	0.5
0.6708	72.0	24480	0.6794	0.5
0.6703	73.0	24820	0.6756	0.5
0.6702	74.0	25160	0.6760	0.5
0.6688	75.0	25500	0.6741	0.4984
0.6688	76.0	25840	0.6753	0.5
0.67	77.0	26180	0.6730	0.4984
0.6688	78.0	26520	0.6751	0.4984
0.6688	79.0	26860	0.6750	0.4984
0.6685	80.0	27200	0.6751	0.4984

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230830102630

20230830102630

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230830102630

Evaluation results