20230901103238

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1604
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1601	0.5
0.1929	2.0	680	0.1664	0.5
0.1697	3.0	1020	0.1642	0.5
0.1697	4.0	1360	0.1567	0.5
0.1668	5.0	1700	0.1592	0.5
0.1636	6.0	2040	0.1606	0.5
0.1636	7.0	2380	0.1562	0.5
0.1636	8.0	2720	0.1562	0.5
0.1598	9.0	3060	0.1607	0.5
0.1598	10.0	3400	0.1642	0.5
0.1643	11.0	3740	0.1606	0.5
0.1677	12.0	4080	0.1649	0.5
0.1677	13.0	4420	0.1603	0.5
0.1651	14.0	4760	0.1602	0.5
0.1672	15.0	5100	0.1600	0.5
0.1672	16.0	5440	0.1602	0.5
0.1669	17.0	5780	0.1603	0.5
0.1642	18.0	6120	0.1600	0.5
0.1642	19.0	6460	0.1601	0.5
0.1666	20.0	6800	0.1615	0.5
0.1655	21.0	7140	0.1600	0.5
0.1655	22.0	7480	0.1601	0.5
0.1664	23.0	7820	0.1602	0.5
0.1655	24.0	8160	0.1608	0.5
0.1667	25.0	8500	0.1624	0.5
0.1667	26.0	8840	0.1606	0.5
0.1656	27.0	9180	0.1642	0.5
0.1647	28.0	9520	0.1600	0.5
0.1647	29.0	9860	0.1645	0.5
0.1665	30.0	10200	0.1618	0.5
0.1655	31.0	10540	0.1601	0.5
0.1655	32.0	10880	0.1606	0.5
0.1653	33.0	11220	0.1631	0.5
0.1655	34.0	11560	0.1623	0.5
0.1655	35.0	11900	0.1632	0.5
0.1655	36.0	12240	0.1609	0.5
0.1652	37.0	12580	0.1600	0.5
0.1652	38.0	12920	0.1601	0.5
0.1643	39.0	13260	0.1615	0.5
0.1652	40.0	13600	0.1634	0.5
0.1652	41.0	13940	0.1603	0.5
0.1655	42.0	14280	0.1600	0.5
0.1644	43.0	14620	0.1605	0.5
0.1644	44.0	14960	0.1612	0.5
0.166	45.0	15300	0.1609	0.5
0.1646	46.0	15640	0.1612	0.5
0.1646	47.0	15980	0.1631	0.5
0.1659	48.0	16320	0.1603	0.5
0.1648	49.0	16660	0.1606	0.5
0.1651	50.0	17000	0.1604	0.5
0.1651	51.0	17340	0.1605	0.5
0.1643	52.0	17680	0.1602	0.5
0.1658	53.0	18020	0.1643	0.5
0.1658	54.0	18360	0.1609	0.5
0.1648	55.0	18700	0.1607	0.5
0.1649	56.0	19040	0.1601	0.5
0.1649	57.0	19380	0.1618	0.5
0.1642	58.0	19720	0.1601	0.5
0.1654	59.0	20060	0.1667	0.5
0.1654	60.0	20400	0.1609	0.5
0.1644	61.0	20740	0.1603	0.5
0.1643	62.0	21080	0.1621	0.5
0.1643	63.0	21420	0.1600	0.5
0.1638	64.0	21760	0.1600	0.5
0.1661	65.0	22100	0.1601	0.5
0.1661	66.0	22440	0.1616	0.5
0.1626	67.0	22780	0.1600	0.5
0.166	68.0	23120	0.1601	0.5
0.166	69.0	23460	0.1600	0.5
0.1645	70.0	23800	0.1600	0.5
0.1644	71.0	24140	0.1601	0.5
0.1644	72.0	24480	0.1604	0.5
0.1638	73.0	24820	0.1612	0.5
0.1646	74.0	25160	0.1604	0.5
0.164	75.0	25500	0.1607	0.5
0.164	76.0	25840	0.1602	0.5
0.1644	77.0	26180	0.1603	0.5
0.1644	78.0	26520	0.1608	0.5
0.1644	79.0	26860	0.1603	0.5
0.1643	80.0	27200	0.1604	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901103238

20230901103238

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230901103238

Evaluation results