GUE_tf_3-seqsight_8192_512_30M-L32_all

This model is a fine-tuned version of mahdibaghbanzadeh/seqsight_8192_512_30M on the mahdibaghbanzadeh/GUE_tf_3 dataset. It achieves the following results on the evaluation set:

Loss: 0.6945
F1 Score: 0.6306
Accuracy: 0.634

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 2048
eval_batch_size: 2048
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 10000

Training results

Training Loss	Epoch	Step	Validation Loss	F1 Score	Accuracy
0.6652	14.29	200	0.6261	0.6351	0.647
0.6027	28.57	400	0.6331	0.6487	0.655
0.5569	42.86	600	0.6640	0.6571	0.657
0.5209	57.14	800	0.6659	0.6667	0.667
0.494	71.43	1000	0.7023	0.6501	0.65
0.4694	85.71	1200	0.7381	0.646	0.646
0.452	100.0	1400	0.7667	0.6200	0.622
0.4332	114.29	1600	0.7595	0.6270	0.627
0.4193	128.57	1800	0.7789	0.6348	0.635
0.405	142.86	2000	0.7961	0.6230	0.623
0.393	157.14	2200	0.8005	0.6279	0.628
0.3814	171.43	2400	0.9150	0.6064	0.608
0.3679	185.71	2600	0.8467	0.6221	0.622
0.3581	200.0	2800	0.8222	0.6150	0.616
0.3458	214.29	3000	0.8990	0.616	0.616
0.3343	228.57	3200	0.9159	0.6185	0.619
0.3241	242.86	3400	0.9124	0.6011	0.601
0.3145	257.14	3600	0.9340	0.6141	0.614
0.3054	271.43	3800	0.9421	0.6161	0.618
0.2955	285.71	4000	0.9610	0.6050	0.605
0.2851	300.0	4200	0.9503	0.6132	0.614
0.2787	314.29	4400	0.9691	0.6088	0.609
0.2713	328.57	4600	0.9770	0.6107	0.611
0.2643	342.86	4800	1.0160	0.5997	0.6
0.2568	357.14	5000	1.0290	0.6181	0.618
0.2495	371.43	5200	1.0194	0.6058	0.606
0.2435	385.71	5400	1.0307	0.6058	0.606
0.2382	400.0	5600	1.0560	0.6014	0.602
0.2318	414.29	5800	1.0271	0.6011	0.601
0.2279	428.57	6000	1.0710	0.6041	0.604
0.2202	442.86	6200	1.1111	0.5997	0.6
0.218	457.14	6400	1.0763	0.6051	0.605
0.2131	471.43	6600	1.0867	0.6120	0.612
0.2079	485.71	6800	1.1044	0.6080	0.608
0.2051	500.0	7000	1.0884	0.6141	0.614
0.2003	514.29	7200	1.1269	0.6081	0.608
0.1964	528.57	7400	1.1436	0.6058	0.606
0.1954	542.86	7600	1.1151	0.6030	0.603
0.1917	557.14	7800	1.1323	0.6081	0.608
0.1886	571.43	8000	1.1501	0.5968	0.597
0.1874	585.71	8200	1.1396	0.6041	0.604
0.1845	600.0	8400	1.1702	0.6050	0.605
0.1821	614.29	8600	1.1690	0.6031	0.603
0.1804	628.57	8800	1.1632	0.5978	0.598
0.1786	642.86	9000	1.1731	0.6009	0.601
0.1776	657.14	9200	1.1736	0.6030	0.603
0.177	671.43	9400	1.1712	0.5960	0.596
0.1747	685.71	9600	1.1700	0.6050	0.605
0.1731	700.0	9800	1.1720	0.5990	0.599
0.1742	714.29	10000	1.1726	0.6000	0.6

Framework versions

PEFT 0.9.0
Transformers 4.38.2
Pytorch 2.2.0+cu121
Datasets 2.17.1
Tokenizers 0.15.2

mahdibaghbanzadeh
/

GUE_tf_3-seqsight_8192_512_30M-L32_all

GUE_tf_3-seqsight_8192_512_30M-L32_all

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results