Glue_distilbert

This model is a fine-tuned version of distilbert-base-uncased on the GLUE MRPC dataset. It achieves the following results on the evaluation set:

Loss: 1.1042
Accuracy: 0.8505
F1: 0.8961
Combined Score: 0.8733

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 33
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Combined Score
0.5066	1.0	115	0.3833	0.8358	0.8851	0.8604
0.3227	2.0	230	0.4336	0.8309	0.8844	0.8577
0.1764	3.0	345	0.4943	0.8309	0.8757	0.8533
0.0792	4.0	460	0.7271	0.8431	0.8861	0.8646
0.058	5.0	575	0.8374	0.8456	0.8945	0.8700
0.0594	6.0	690	0.7570	0.8309	0.8816	0.8563
0.0395	7.0	805	0.8640	0.8431	0.8897	0.8664
0.03	8.0	920	0.9007	0.8260	0.8799	0.8529
0.0283	9.0	1035	0.9479	0.8211	0.8685	0.8448
0.0127	10.0	1150	1.0686	0.8431	0.8915	0.8673
0.0097	11.0	1265	1.0752	0.8431	0.8919	0.8675
0.0164	12.0	1380	1.0627	0.8284	0.8801	0.8543
0.007	13.0	1495	1.1466	0.8333	0.8815	0.8574
0.0132	14.0	1610	1.1442	0.8456	0.8938	0.8697
0.0125	15.0	1725	1.1716	0.8235	0.8771	0.8503
0.0174	16.0	1840	1.1187	0.8333	0.8790	0.8562
0.0171	17.0	1955	1.1053	0.8456	0.8938	0.8697
0.0026	18.0	2070	1.2011	0.8309	0.8787	0.8548
0.0056	19.0	2185	1.3085	0.8260	0.8748	0.8504
0.0067	20.0	2300	1.3042	0.8333	0.8803	0.8568
0.0129	21.0	2415	1.1042	0.8505	0.8961	0.8733
0.0149	22.0	2530	1.1575	0.8235	0.8820	0.8527
0.0045	23.0	2645	1.2359	0.8407	0.8900	0.8654
0.0029	24.0	2760	1.3823	0.8211	0.8744	0.8477
0.0074	25.0	2875	1.2394	0.8431	0.8904	0.8668
0.002	26.0	2990	1.4450	0.8333	0.8859	0.8596
0.0039	27.0	3105	1.5102	0.8284	0.8805	0.8545
0.0015	28.0	3220	1.4767	0.8431	0.8915	0.8673
0.0062	29.0	3335	1.5101	0.8407	0.8926	0.8666
0.0054	30.0	3450	1.3861	0.8382	0.8893	0.8637
0.0001	31.0	3565	1.4101	0.8456	0.8948	0.8702
0.0	32.0	3680	1.4203	0.8480	0.8963	0.8722
0.002	33.0	3795	1.4526	0.8431	0.8923	0.8677
0.0019	34.0	3910	1.6265	0.8260	0.8842	0.8551
0.0029	35.0	4025	1.4788	0.8456	0.8945	0.8700
0.0	36.0	4140	1.4668	0.8480	0.8956	0.8718
0.0007	37.0	4255	1.5248	0.8456	0.8945	0.8700
0.0	38.0	4370	1.5202	0.8480	0.8960	0.8720
0.0033	39.0	4485	1.5541	0.8358	0.8878	0.8618
0.0017	40.0	4600	1.5097	0.8407	0.8904	0.8655
0.0	41.0	4715	1.5301	0.8407	0.8904	0.8655
0.0	42.0	4830	1.4974	0.8407	0.8862	0.8634
0.0018	43.0	4945	1.5483	0.8382	0.8896	0.8639
0.0	44.0	5060	1.5071	0.8480	0.8931	0.8706
0.0	45.0	5175	1.5104	0.8480	0.8935	0.8708
0.0011	46.0	5290	1.5445	0.8382	0.8896	0.8639
0.0012	47.0	5405	1.5378	0.8431	0.8900	0.8666
0.0	48.0	5520	1.5577	0.8407	0.8881	0.8644
0.0009	49.0	5635	1.5431	0.8407	0.8885	0.8646
0.0002	50.0	5750	1.5383	0.8431	0.8904	0.8668

Framework versions

Transformers 4.25.1
Pytorch 1.13.0+cu116
Datasets 2.8.0
Tokenizers 0.13.2

gokuls
/

Glue_distilbert

Glue_distilbert

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train gokuls/Glue_distilbert

Evaluation results