R-facebook-bart-base-full-ft-with-tum-nlp-german-gpt2_easy-prior-pp-no-ls-4c77

This model is a fine-tuned version of facebook/bart-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.1506
Sacrebleu: 7.6134
Bleu: 0.0761
Rouge1: 0.3006
Rouge2: 0.1038
Rougel: 0.2079
Sari: 39.5909

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 15
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training results

Training Loss	Epoch	Step	Validation Loss	Sacrebleu	Bleu	Rouge1	Rouge2	Rougel	Sari
6.9721	0.25	100	4.1739	1.8048	0.0180	0.1980	0.0611	0.1541	37.1235
3.8977	0.5	200	4.0984	1.2756	0.0128	0.2076	0.0678	0.1581	37.6186
4.035	0.75	300	4.0622	2.6499	0.0265	0.2271	0.0740	0.1741	38.1373
8.2055	0.99	400	4.0561	2.7363	0.0274	0.2332	0.0804	0.1716	38.0851
3.6957	1.24	500	4.0262	3.5110	0.0351	0.2560	0.0852	0.1852	37.9403
3.0846	1.49	600	4.0121	3.2967	0.0330	0.2471	0.0815	0.1799	37.5590
3.283	1.74	700	4.0510	3.8512	0.0385	0.2602	0.0917	0.1951	38.0037
4.7429	1.99	800	4.0048	3.4891	0.0349	0.2524	0.0850	0.1877	38.0324
3.024	2.24	900	3.9860	3.9202	0.0392	0.2633	0.0844	0.1891	37.9931
5.6861	2.49	1000	4.0493	4.4801	0.0448	0.2622	0.0878	0.1926	38.2052
3.6185	2.74	1100	4.0394	3.6710	0.0367	0.2608	0.0857	0.1866	37.9620
3.3582	2.98	1200	4.0004	5.1257	0.0513	0.2695	0.0922	0.1956	38.4845
5.0036	3.23	1300	4.0223	5.3256	0.0533	0.2752	0.0938	0.1975	38.6943
3.9904	3.48	1400	4.0040	5.0070	0.0501	0.2744	0.0927	0.1951	38.5338
3.1496	3.73	1500	4.0282	5.9234	0.0592	0.2803	0.0907	0.2002	38.2119
3.9604	3.98	1600	4.0253	5.1875	0.0519	0.2658	0.0864	0.1920	38.2336
2.9813	4.23	1700	4.0148	5.9589	0.0596	0.2891	0.0976	0.2028	38.8216
3.5448	4.48	1800	4.0071	5.2759	0.0528	0.2736	0.0867	0.1894	37.8800
3.6836	4.72	1900	4.0105	5.1414	0.0514	0.2750	0.0894	0.1982	38.3898
4.0471	4.97	2000	3.9788	5.5747	0.0557	0.2792	0.0932	0.1973	38.5705
3.3437	5.22	2100	4.0057	5.3969	0.0540	0.2827	0.0926	0.1978	38.3453
3.1657	5.47	2200	4.0439	5.4820	0.0548	0.2861	0.0946	0.2071	38.4004
2.5486	5.72	2300	4.0315	6.1738	0.0617	0.2896	0.0966	0.2048	38.5404
3.6148	5.97	2400	4.0056	6.5570	0.0656	0.2941	0.1046	0.2072	39.0698
3.1477	6.22	2500	4.0612	6.2221	0.0622	0.2806	0.0932	0.1998	38.5211
3.175	6.47	2600	4.0126	6.6920	0.0669	0.2916	0.1037	0.2122	39.1438
4.6616	6.71	2700	4.0467	6.0344	0.0603	0.2804	0.0953	0.1983	38.4171
3.109	6.96	2800	4.0420	5.8656	0.0587	0.2864	0.0983	0.2034	38.7225
3.0659	7.21	2900	4.0613	5.6029	0.0560	0.2839	0.0938	0.1980	38.7136
2.658	7.46	3000	4.0726	6.2791	0.0628	0.2824	0.0947	0.1972	38.6330
3.178	7.71	3100	4.0437	6.4351	0.0644	0.2924	0.0956	0.2032	38.6577
4.0606	7.96	3200	4.0644	6.6271	0.0663	0.2966	0.1019	0.2088	39.1513
3.664	8.21	3300	4.0615	6.3354	0.0634	0.2961	0.0981	0.2024	38.6904
2.8457	8.46	3400	4.0861	7.4278	0.0743	0.2975	0.1025	0.2017	39.0452
3.3883	8.7	3500	4.1037	6.4498	0.0645	0.2826	0.0955	0.2008	38.5961
5.4189	8.95	3600	4.1099	6.0065	0.0601	0.2946	0.0952	0.2020	38.6177
3.2093	9.2	3700	4.1074	6.2514	0.0625	0.2933	0.0942	0.2014	38.7227
3.9625	9.45	3800	4.0937	6.6653	0.0667	0.2912	0.0970	0.2020	38.4853
2.7172	9.7	3900	4.1130	6.1736	0.0617	0.2860	0.0898	0.1948	38.5064
2.4973	9.95	4000	4.0737	7.4889	0.0749	0.2986	0.1023	0.2060	39.2124
2.7371	10.2	4100	4.1032	6.4897	0.0649	0.2985	0.0990	0.2031	38.3514
3.9244	10.44	4200	4.0880	6.7268	0.0673	0.2906	0.1006	0.2012	38.6404
3.2153	10.69	4300	4.0961	6.7780	0.0678	0.2953	0.0977	0.2008	38.7091
3.0715	10.94	4400	4.1005	7.1435	0.0714	0.2870	0.0937	0.1950	38.5542
2.7833	11.19	4500	4.1112	7.5856	0.0759	0.3008	0.1037	0.2063	38.8659
5.6278	11.44	4600	4.0988	7.8870	0.0789	0.2962	0.1019	0.2025	38.8174
4.3557	11.69	4700	4.1049	7.9121	0.0791	0.3105	0.1076	0.2106	39.2476
3.4938	11.94	4800	4.1067	7.1602	0.0716	0.2961	0.1009	0.2039	38.9165
5.6848	12.19	4900	4.1140	7.8746	0.0787	0.2951	0.0996	0.2005	38.7719
3.4738	12.43	5000	4.0969	7.8672	0.0787	0.3055	0.1087	0.2092	39.0808
2.9039	12.68	5100	4.1185	7.6696	0.0767	0.3033	0.1071	0.2092	39.0788
4.4091	12.93	5200	4.1346	7.9896	0.0799	0.3014	0.1046	0.2070	39.2032
3.102	13.18	5300	4.1308	7.2969	0.0730	0.3030	0.1032	0.2039	39.1031
2.9972	13.43	5400	4.1518	7.7779	0.0778	0.3017	0.1053	0.2090	39.4092
2.7672	13.68	5500	4.1515	7.7545	0.0775	0.3010	0.1079	0.2091	39.0093
3.7358	13.93	5600	4.1360	7.5980	0.0760	0.2970	0.1036	0.2080	39.0873
3.4363	14.17	5700	4.1367	7.2901	0.0729	0.3013	0.1057	0.2084	39.3389
3.3451	14.42	5800	4.1500	7.5605	0.0756	0.2984	0.0979	0.2074	39.0107
2.8616	14.67	5900	4.1447	7.8204	0.0782	0.3020	0.1059	0.2127	39.7465
3.1149	14.92	6000	4.1506	7.6134	0.0761	0.3006	0.1038	0.2079	39.5909

Framework versions

Transformers 4.29.2
Pytorch 2.0.0+cu117
Datasets 2.12.0
Tokenizers 0.13.3

nlp-lab-2023-seq2seq
/

R-facebook-bart-base-full-ft-with-tum-nlp-german-gpt2_easy-prior-pp-no-ls-4c77

R-facebook-bart-base-full-ft-with-tum-nlp-german-gpt2_easy-prior-pp-no-ls-4c77

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results