german-jeopardy-mt5-large-128

This model is a fine-tuned version of google/mt5-large on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 1.5487
Brevity Penalty: 0.9115
System Length: 19029
Reference Length: 20793
ROUGE-1: 43.40
ROUGE-2: 23.68
ROUGE-L: 41.78
ROUGE-Lsum: 41.79
Exact Match: 3.18
BLEU: 16.06
F1: 42.29

Model description

See google/mt5-large for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 7
gradient_accumulation_steps: 128
total_train_batch_size: 128
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Counts 1	Counts 2	Counts 3	Counts 4	Totals 1	Totals 2	Totals 3	Totals 4	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Brevity Penalty	System Length	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	Exact Match	BLEU	Mean Generated Length	F1
3.9659	0.99	72	1.4145	7244	2547	1183	565	16296	14092	11888	9684	44.4526	18.0741	9.9512	5.8344	0.7379	16296	21250	0.3213	0.1608	0.3091	0.309	0.0136	10.8438	11.7786	0.3139
1.7081	1.99	145	1.2632	7865	3037	1498	759	16841	14637	12433	10229	46.7015	20.7488	12.0486	7.4201	0.7697	16841	21250	0.3577	0.189	0.3438	0.3439	0.0181	13.2044	12.225	0.3481
1.4856	3.0	218	1.1974	8608	3519	1818	969	17627	15423	13219	11015	48.8342	22.8166	13.7529	8.7971	0.8142	17627	21250	0.3969	0.2181	0.381	0.3812	0.0268	15.6014	13.0027	0.3882
1.3277	4.0	291	1.1394	9018	3702	1907	1029	17465	15261	13057	10853	51.6347	24.2579	14.6052	9.4812	0.8052	17465	21250	0.424	0.2321	0.4087	0.4085	0.0313	16.4313	12.8716	0.4156
1.2314	4.99	363	1.1193	9240	3869	1994	1076	17794	15590	13386	11182	51.9276	24.8172	14.8962	9.6226	0.8235	17794	21250	0.4336	0.2413	0.4183	0.418	0.0363	17.0718	13.2137	0.4256
1.1264	5.99	436	1.1086	9263	3908	2055	1127	17502	15298	13094	10890	52.9254	25.5458	15.6942	10.3489	0.8072	17502	21250	0.4383	0.2452	0.4239	0.4237	0.0372	17.4744	13.034	0.4309
1.0469	7.0	509	1.1038	9434	4034	2146	1189	18028	15824	13620	11416	52.3297	25.4929	15.7562	10.4152	0.8363	18028	21250	0.4433	0.2505	0.4286	0.4282	0.039	18.0906	13.422	0.4348
0.9874	8.0	582	1.0990	9746	4265	2287	1285	18351	16147	13943	11739	53.1088	26.4136	16.4025	10.9464	0.8539	18351	21250	0.457	0.2627	0.4417	0.4416	0.0454	19.1287	13.6466	0.4498
0.9488	8.99	654	1.1175	9484	4062	2158	1197	17831	15627	13423	11219	53.1883	25.9935	16.0769	10.6694	0.8255	17831	21250	0.4482	0.2548	0.4338	0.4333	0.0431	18.2172	13.2763	0.4399
0.8893	9.99	727	1.1222	9650	4205	2289	1289	18017	15813	13609	11405	53.5605	26.592	16.8198	11.3021	0.8357	18017	21250	0.4543	0.262	0.4396	0.4394	0.0463	19.064	13.4251	0.4472
0.8362	10.99	800	1.1342	9706	4232	2279	1281	18232	16028	13824	11620	53.2361	26.4038	16.4858	11.0241	0.8474	18232	21250	0.4551	0.2632	0.4395	0.4393	0.0472	19.052	13.6021	0.4473
0.7835	12.0	873	1.1427	9802	4280	2292	1285	18491	16287	14083	11879	53.0096	26.2786	16.2749	10.8174	0.8614	18491	21250	0.458	0.2634	0.4414	0.4412	0.0472	19.169	14.0168	0.4497
0.7441	12.99	945	1.1669	9816	4323	2334	1294	18498	16294	14090	11886	53.0652	26.5312	16.5649	10.8868	0.8618	18498	21250	0.4577	0.2659	0.4418	0.4417	0.0463	19.3443	13.8348	0.4493
0.7012	13.99	1018	1.1740	9856	4364	2375	1360	18537	16333	14129	11925	53.1693	26.7189	16.8094	11.4046	0.8639	18537	21250	0.4591	0.2653	0.443	0.4428	0.0476	19.7341	13.976	0.4514
0.6597	14.99	1091	1.1987	9780	4292	2336	1302	18468	16264	14060	11856	52.9565	26.3896	16.6145	10.9818	0.8602	18468	21250	0.457	0.2633	0.4418	0.4416	0.0485	19.3289	13.8802	0.4492
0.6236	16.0	1164	1.2135	9931	4388	2390	1359	18717	16513	14309	12105	53.0587	26.573	16.7028	11.2268	0.8734	18717	21250	0.4618	0.2682	0.4452	0.445	0.0495	19.8055	14.044	0.4538
0.5933	17.0	1237	1.2305	9806	4316	2366	1348	18566	16362	14158	11954	52.817	26.3782	16.7114	11.2766	0.8654	18566	21250	0.4571	0.2628	0.4407	0.4409	0.049	19.5893	14.0622	0.4485
0.5622	17.99	1309	1.2796	9787	4306	2346	1338	18559	16355	14151	11947	52.7345	26.3283	16.5783	11.1995	0.865	18559	21250	0.4549	0.2609	0.4383	0.4382	0.0476	19.4914	13.7763	0.447
0.5275	18.99	1382	1.2833	9918	4363	2374	1355	18950	16746	14542	12338	52.3377	26.054	16.3251	10.9823	0.8857	18950	21250	0.4573	0.2624	0.441	0.4408	0.0508	19.6947	14.1647	0.4499
0.4986	19.79	1440	1.3059	9879	4315	2347	1324	18931	16727	14523	12319	52.1842	25.7966	16.1606	10.7476	0.8847	18931	21250	0.4564	0.2622	0.4407	0.4403	0.0495	19.4544	14.2827	0.4478

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-mt5-large-128