results_mt5_large

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0000
Rouge1: 0.1433
Rouge2: 0.0234
Rougel: 0.1439
Rougelsum: 0.1439
Gen Len: 19.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 1
eval_batch_size: 1
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Gen Len	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
2.5321	0.23	500	16.1877	0.1695	0.1549	0.0284	0.1546	0.1542
2.6555	0.46	1000	13.5126	0.8393	0.1453	0.0271	0.1451	0.1453
0.4292	0.7	1500	18.9296	0.0667	0.1383	0.017	0.1389	0.1389
0.1733	0.93	2000	19.0	0.0385	0.1441	0.0216	0.1445	0.1447
0.114	1.16	2500	19.0	0.0248	0.1414	0.0209	0.1415	0.142
0.0858	1.39	3000	18.8375	0.0248	0.1398	0.021	0.1401	0.1402
0.0667	1.62	3500	19.0	0.0205	0.1442	0.024	0.1445	0.1445
0.053	1.86	4000	18.843	0.0164	0.1556	0.0352	0.1553	0.1562
0.0426	2.09	4500	18.9188	0.0140	0.1497	0.0287	0.1501	0.1504
0.0402	2.32	5000	18.7888	0.0152	0.1424	0.0231	0.1425	0.1425
0.0373	2.55	5500	18.87	0.0122	0.1598	0.0261	0.16	0.16
0.0328	2.78	6000	18.9242	0.0125	0.1456	0.0229	0.1457	0.1457
0.0303	3.01	6500	18.7708	0.0117	0.149	0.031	0.1491	0.1496
0.026	3.25	7000	19.0	0.0096	0.1435	0.0257	0.1431	0.1437
0.0238	3.48	7500	18.9242	0.0092	0.138	0.0213	0.1383	0.1388
0.0245	3.71	8000	18.9242	0.0090	0.1436	0.0238	0.1439	0.1438
0.0202	3.94	8500	18.9242	0.0100	0.1536	0.029	0.1543	0.1537
0.0194	4.17	9000	18.9747	0.0085	0.1413	0.0211	0.1414	0.1417
0.019	4.41	9500	18.9242	0.0073	0.1455	0.0228	0.1453	0.1457
0.0178	4.64	10000	18.8736	0.0068	0.1415	0.0173	0.1416	0.1421
0.0185	4.87	10500	19.0	0.0072	0.1385	0.0183	0.1389	0.1389
0.0169	5.1	11000	18.8989	0.0069	0.1518	0.0277	0.1516	0.1521
0.0165	5.33	11500	18.8989	0.0062	0.1616	0.035	0.1616	0.1618
0.0146	5.57	12000	19.0	0.0025	0.1433	0.0234	0.1439	0.1439
0.0096	5.8	12500	19.0	0.0012	0.1433	0.0234	0.1439	0.1439
0.0074	6.03	13000	19.0	0.0017	0.1433	0.0234	0.1439	0.1439
0.0056	6.26	13500	19.0	0.0011	0.1431	0.0232	0.1438	0.1437
0.0068	6.49	14000	19.0	0.0006	0.1433	0.0234	0.1439	0.1439
0.0087	6.73	14500	19.0	0.0007	0.1433	0.0234	0.1439	0.1439
0.005	6.96	15000	19.0	0.0005	0.1433	0.0234	0.1439	0.1439
0.0046	7.19	15500	19.0	0.0009	0.1433	0.0234	0.1439	0.1439
0.0049	7.42	16000	19.0	0.0003	0.1433	0.0234	0.1439	0.1439
0.004	7.65	16500	19.0	0.0004	0.1433	0.0234	0.1439	0.1439
0.0039	7.88	17000	19.0	0.0001	0.1433	0.0234	0.1439	0.1439
0.0031	8.12	17500	19.0	0.0005	0.1433	0.0234	0.1439	0.1439
0.0025	8.35	18000	19.0	0.0004	0.1433	0.0234	0.1439	0.1439
0.0027	8.58	18500	0.0001	0.1433	0.0234	0.1439	0.1439	19.0
0.0021	8.81	19000	0.0000	0.1433	0.0234	0.1439	0.1439	19.0
0.0026	9.04	19500	0.0000	0.1433	0.0234	0.1439	0.1439	19.0

Framework versions

Transformers 4.40.0.dev0
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

hiba2
/

results_mt5_large

results_mt5_large

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results