20230901101200

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1593
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1696	0.5
0.1874	2.0	680	0.1654	0.5
0.1712	3.0	1020	0.1626	0.5
0.1712	4.0	1360	0.1604	0.5
0.1706	5.0	1700	0.1658	0.5
0.1677	6.0	2040	0.1600	0.5
0.1677	7.0	2380	0.1608	0.5
0.1695	8.0	2720	0.1604	0.5
0.1669	9.0	3060	0.1605	0.5
0.1669	10.0	3400	0.1694	0.5
0.168	11.0	3740	0.1618	0.5
0.168	12.0	4080	0.1641	0.5
0.168	13.0	4420	0.1601	0.5
0.1667	14.0	4760	0.1601	0.5
0.1679	15.0	5100	0.1640	0.5
0.1679	16.0	5440	0.1638	0.5
0.1681	17.0	5780	0.1636	0.5
0.1655	18.0	6120	0.1645	0.5
0.1655	19.0	6460	0.1627	0.5
0.1672	20.0	6800	0.1601	0.5
0.1672	21.0	7140	0.1618	0.5
0.1672	22.0	7480	0.1668	0.5
0.1675	23.0	7820	0.1599	0.5
0.1663	24.0	8160	0.1608	0.5
0.168	25.0	8500	0.1617	0.5
0.168	26.0	8840	0.1601	0.5
0.1667	27.0	9180	0.1604	0.5
0.1655	28.0	9520	0.1643	0.5
0.1655	29.0	9860	0.1605	0.5
0.1675	30.0	10200	0.1603	0.5
0.1664	31.0	10540	0.1602	0.5
0.1664	32.0	10880	0.1631	0.5
0.1666	33.0	11220	0.1611	0.5
0.167	34.0	11560	0.1616	0.5
0.167	35.0	11900	0.1613	0.5
0.1667	36.0	12240	0.1600	0.5
0.1662	37.0	12580	0.1600	0.5
0.1662	38.0	12920	0.1702	0.5
0.1652	39.0	13260	0.1599	0.5
0.1659	40.0	13600	0.1600	0.5
0.1659	41.0	13940	0.1605	0.5
0.1661	42.0	14280	0.1601	0.5
0.165	43.0	14620	0.1622	0.5
0.165	44.0	14960	0.1607	0.5
0.1664	45.0	15300	0.1621	0.5
0.1654	46.0	15640	0.1600	0.5
0.1654	47.0	15980	0.1606	0.5
0.1666	48.0	16320	0.1612	0.5
0.1652	49.0	16660	0.1600	0.5
0.1658	50.0	17000	0.1605	0.5
0.1658	51.0	17340	0.1604	0.5
0.1647	52.0	17680	0.1606	0.5
0.1657	53.0	18020	0.1641	0.5
0.1657	54.0	18360	0.1613	0.5
0.1644	55.0	18700	0.1605	0.5
0.1643	56.0	19040	0.1592	0.5
0.1643	57.0	19380	0.1600	0.5
0.1632	58.0	19720	0.1633	0.5
0.1643	59.0	20060	0.1612	0.5
0.1643	60.0	20400	0.1604	0.5
0.163	61.0	20740	0.1616	0.5
0.1623	62.0	21080	0.1598	0.5
0.1623	63.0	21420	0.1597	0.5
0.1616	64.0	21760	0.1655	0.5
0.1636	65.0	22100	0.1595	0.5
0.1636	66.0	22440	0.1599	0.5
0.1599	67.0	22780	0.1598	0.5
0.163	68.0	23120	0.1602	0.5
0.163	69.0	23460	0.1587	0.5
0.1613	70.0	23800	0.1604	0.5
0.1608	71.0	24140	0.1599	0.5
0.1608	72.0	24480	0.1587	0.5
0.1604	73.0	24820	0.1610	0.5
0.1606	74.0	25160	0.1592	0.5
0.1599	75.0	25500	0.1587	0.5
0.1599	76.0	25840	0.1593	0.5
0.1604	77.0	26180	0.1589	0.5
0.16	78.0	26520	0.1602	0.5
0.16	79.0	26860	0.1596	0.5
0.1599	80.0	27200	0.1593	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901101200

20230901101200

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230901101200

Evaluation results