20230901052747

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1598
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0007
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1590	0.5
0.1856	2.0	680	0.1599	0.5
0.171	3.0	1020	0.1598	0.5
0.171	4.0	1360	0.1603	0.5
0.169	5.0	1700	0.1644	0.5
0.1669	6.0	2040	0.1598	0.5
0.1669	7.0	2380	0.1600	0.5
0.1691	8.0	2720	0.1599	0.5
0.1658	9.0	3060	0.1602	0.5
0.1658	10.0	3400	0.1673	0.5
0.1688	11.0	3740	0.1638	0.5
0.1677	12.0	4080	0.1624	0.5
0.1677	13.0	4420	0.1597	0.5
0.1664	14.0	4760	0.1752	0.5
0.1681	15.0	5100	0.1664	0.5
0.1681	16.0	5440	0.1644	0.5
0.1679	17.0	5780	0.1667	0.5
0.166	18.0	6120	0.1618	0.5
0.166	19.0	6460	0.1607	0.5
0.1668	20.0	6800	0.1598	0.5
0.1666	21.0	7140	0.1620	0.5
0.1666	22.0	7480	0.1650	0.5
0.1673	23.0	7820	0.1600	0.5
0.1662	24.0	8160	0.1608	0.5
0.1677	25.0	8500	0.1599	0.5
0.1677	26.0	8840	0.1605	0.5
0.1665	27.0	9180	0.1608	0.5
0.1656	28.0	9520	0.1656	0.5
0.1656	29.0	9860	0.1605	0.5
0.1674	30.0	10200	0.1614	0.5
0.1663	31.0	10540	0.1609	0.5
0.1663	32.0	10880	0.1602	0.5
0.1667	33.0	11220	0.1614	0.5
0.1666	34.0	11560	0.1611	0.5
0.1666	35.0	11900	0.1663	0.5
0.1665	36.0	12240	0.1597	0.5
0.1662	37.0	12580	0.1606	0.5
0.1662	38.0	12920	0.1683	0.5
0.1653	39.0	13260	0.1597	0.5
0.1658	40.0	13600	0.1597	0.5
0.1658	41.0	13940	0.1599	0.5
0.1659	42.0	14280	0.1597	0.5
0.1648	43.0	14620	0.1656	0.5
0.1648	44.0	14960	0.1612	0.5
0.1663	45.0	15300	0.1630	0.5
0.1649	46.0	15640	0.1605	0.5
0.1649	47.0	15980	0.1599	0.5
0.1663	48.0	16320	0.1595	0.5
0.1645	49.0	16660	0.1612	0.5
0.1652	50.0	17000	0.1599	0.5
0.1652	51.0	17340	0.1614	0.5
0.1638	52.0	17680	0.1613	0.5
0.1656	53.0	18020	0.1652	0.5
0.1656	54.0	18360	0.1623	0.5
0.1643	55.0	18700	0.1621	0.5
0.1645	56.0	19040	0.1596	0.5
0.1645	57.0	19380	0.1601	0.5
0.1636	58.0	19720	0.1634	0.5
0.1648	59.0	20060	0.1602	0.5
0.1648	60.0	20400	0.1598	0.5
0.1642	61.0	20740	0.1642	0.5
0.1635	62.0	21080	0.1620	0.5
0.1635	63.0	21420	0.1612	0.5
0.1631	64.0	21760	0.1655	0.5
0.1653	65.0	22100	0.1604	0.5
0.1653	66.0	22440	0.1602	0.5
0.162	67.0	22780	0.1605	0.5
0.1654	68.0	23120	0.1605	0.5
0.1654	69.0	23460	0.1595	0.5
0.1637	70.0	23800	0.1596	0.5
0.1635	71.0	24140	0.1604	0.5
0.1635	72.0	24480	0.1599	0.5
0.163	73.0	24820	0.1622	0.5
0.1636	74.0	25160	0.1598	0.5
0.1629	75.0	25500	0.1597	0.5
0.1629	76.0	25840	0.1596	0.5
0.1633	77.0	26180	0.1598	0.5
0.1633	78.0	26520	0.1606	0.5
0.1633	79.0	26860	0.1597	0.5
0.1632	80.0	27200	0.1598	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901052747

20230901052747

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for dkqjrm/20230901052747

Dataset used to train dkqjrm/20230901052747

Evaluation results