20230901052720

This model is a fine-tuned version of bert-large-cased on the super_glue dataset. It achieves the following results on the evaluation set:

Loss: 0.1565
Accuracy: 0.5

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 16
eval_batch_size: 8
seed: 11
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 80.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1.0	340	0.1594	0.5
0.1901	2.0	680	0.1620	0.5
0.1693	3.0	1020	0.1564	0.5
0.1693	4.0	1360	0.1563	0.5
0.1657	5.0	1700	0.1575	0.5
0.1638	6.0	2040	0.1594	0.5
0.1638	7.0	2380	0.1557	0.5
0.1632	8.0	2720	0.1568	0.5
0.1621	9.0	3060	0.1606	0.5
0.1621	10.0	3400	0.1614	0.5
0.1661	11.0	3740	0.1569	0.5
0.1641	12.0	4080	0.1570	0.5
0.1641	13.0	4420	0.1555	0.5
0.1582	14.0	4760	0.1627	0.5
0.1598	15.0	5100	0.1558	0.5
0.1598	16.0	5440	0.1557	0.5
0.16	17.0	5780	0.1558	0.5
0.1571	18.0	6120	0.1560	0.5
0.1571	19.0	6460	0.1553	0.5
0.1594	20.0	6800	0.1556	0.5
0.1581	21.0	7140	0.1635	0.5
0.1581	22.0	7480	0.1562	0.5
0.1585	23.0	7820	0.1578	0.5
0.1574	24.0	8160	0.1561	0.5
0.1585	25.0	8500	0.1561	0.5
0.1585	26.0	8840	0.1567	0.5
0.1573	27.0	9180	0.1559	0.5
0.1569	28.0	9520	0.1624	0.5
0.1569	29.0	9860	0.1559	0.5
0.1578	30.0	10200	0.1570	0.5
0.1569	31.0	10540	0.1598	0.5
0.1569	32.0	10880	0.1564	0.5
0.1569	33.0	11220	0.1611	0.5
0.1567	34.0	11560	0.1578	0.5
0.1567	35.0	11900	0.1567	0.5
0.1573	36.0	12240	0.1562	0.5
0.1564	37.0	12580	0.1574	0.5
0.1564	38.0	12920	0.1609	0.5
0.1553	39.0	13260	0.1574	0.5
0.156	40.0	13600	0.1578	0.5
0.156	41.0	13940	0.1580	0.5
0.1564	42.0	14280	0.1589	0.5
0.1551	43.0	14620	0.1564	0.5
0.1551	44.0	14960	0.1579	0.5
0.1563	45.0	15300	0.1569	0.5
0.1555	46.0	15640	0.1564	0.5
0.1555	47.0	15980	0.1558	0.5
0.1568	48.0	16320	0.1569	0.5
0.1554	49.0	16660	0.1560	0.5
0.1558	50.0	17000	0.1571	0.5
0.1558	51.0	17340	0.1564	0.5
0.1554	52.0	17680	0.1565	0.5
0.1567	53.0	18020	0.1573	0.5
0.1567	54.0	18360	0.1567	0.5
0.1556	55.0	18700	0.1563	0.5
0.1555	56.0	19040	0.1566	0.5
0.1555	57.0	19380	0.1561	0.5
0.1551	58.0	19720	0.1559	0.5
0.156	59.0	20060	0.1571	0.5
0.156	60.0	20400	0.1561	0.5
0.155	61.0	20740	0.1569	0.5
0.1548	62.0	21080	0.1561	0.5
0.1548	63.0	21420	0.1561	0.5
0.1542	64.0	21760	0.1584	0.5
0.1562	65.0	22100	0.1566	0.5
0.1562	66.0	22440	0.1565	0.5
0.1528	67.0	22780	0.1562	0.5
0.1562	68.0	23120	0.1566	0.5
0.1562	69.0	23460	0.1562	0.5
0.155	70.0	23800	0.1568	0.5
0.1544	71.0	24140	0.1566	0.5
0.1544	72.0	24480	0.1561	0.5
0.1543	73.0	24820	0.1562	0.5
0.1546	74.0	25160	0.1563	0.5
0.1542	75.0	25500	0.1563	0.5
0.1542	76.0	25840	0.1565	0.5
0.1548	77.0	26180	0.1566	0.5
0.1543	78.0	26520	0.1563	0.5
0.1543	79.0	26860	0.1567	0.5
0.1542	80.0	27200	0.1565	0.5

Framework versions

Transformers 4.26.1
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

dkqjrm
/

20230901052720

20230901052720

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train dkqjrm/20230901052720

Evaluation results