german-jeopardy-mt5-large-256

This model is a fine-tuned version of google/mt5-large on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 1.3943
Brevity Penalty: 0.9201
System Length: 19195
Reference Length: 20793
ROUGE-1: 43.56
ROUGE-2: 23.78
ROUGE-L: 41.81
ROUGE-Lsum: 41.80
Exact Match: 3.13
BLEU: 16.43
F1: 42.48

Model description

See google/mt5-large for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 7
gradient_accumulation_steps: 256
total_train_batch_size: 256
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Counts 1	Counts 2	Counts 3	Counts 4	Totals 1	Totals 2	Totals 3	Totals 4	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Brevity Penalty	System Length	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	Exact Match	BLEU	Mean Generated Length	F1
5.932	0.99	36	2.4510	5614	1426	527	204	28835	26631	24427	22223	19.4694	5.3547	2.1574	0.918	1.0	28835	21250	0.1946	0.0763	0.1843	0.1843	0.0	3.7906	11.4306	0.2127
2.3089	1.98	72	1.3964	7578	2696	1244	580	17203	14999	12795	10591	44.0505	17.9745	9.7225	5.4763	0.7904	17203	21250	0.3312	0.1655	0.316	0.3162	0.01	11.3254	12.6583	0.3246
1.6778	3.0	109	1.2660	7961	3020	1480	747	17067	14863	12659	10455	46.6456	20.3189	11.6913	7.1449	0.7826	17067	21250	0.3608	0.1881	0.3456	0.3454	0.0195	13.128	12.4682	0.3517
1.5383	3.99	145	1.2212	7948	3121	1558	796	16694	14490	12286	10082	47.6099	21.539	12.6811	7.8953	0.7612	16694	21250	0.3663	0.1989	0.3523	0.352	0.024	13.625	12.221	0.3554
1.423	4.97	181	1.1706	8746	3590	1840	963	17765	15561	13357	11153	49.2316	23.0705	13.7755	8.6344	0.8219	17765	21250	0.4033	0.2224	0.3876	0.3874	0.0304	15.7567	13.0277	0.3941
1.2861	5.99	218	1.1327	8885	3646	1864	1005	17406	15202	12998	10794	51.0456	23.9837	14.3407	9.3107	0.8018	17406	21250	0.4181	0.2295	0.4022	0.402	0.0331	16.123	12.9142	0.4092
1.2372	6.98	254	1.1248	9122	3824	1997	1084	17310	15106	12902	10698	52.6979	25.3144	15.4782	10.1327	0.7964	17310	21250	0.4313	0.239	0.4175	0.4172	0.0358	17.0334	12.8412	0.4236
1.1307	8.0	291	1.0998	9423	4019	2136	1190	18074	15870	13666	11462	52.1357	25.3245	15.63	10.3821	0.8389	18074	21250	0.441	0.249	0.4255	0.4252	0.0404	18.0474	13.4138	0.4327
1.0982	8.99	327	1.1052	9450	4003	2147	1184	18145	15941	13737	11533	52.0805	25.1113	15.6293	10.2662	0.8427	18145	21250	0.4427	0.2492	0.4266	0.4261	0.0426	18.0367	13.4465	0.4344
1.0449	9.98	363	1.0996	9471	4036	2149	1180	18067	15863	13659	11455	52.4215	25.4429	15.7332	10.3012	0.8385	18067	21250	0.4422	0.2477	0.4261	0.4257	0.0404	18.0793	13.333	0.4341
0.9686	10.99	400	1.1012	9612	4165	2240	1233	17983	15779	13575	11371	53.4505	26.3958	16.5009	10.8434	0.8339	17983	21250	0.4534	0.2591	0.4381	0.4378	0.0449	18.6914	13.3534	0.4458
0.9465	11.98	436	1.1027	9670	4154	2229	1239	18217	16013	13809	11605	53.0823	25.9414	16.1416	10.6764	0.8466	18217	21250	0.4531	0.258	0.4377	0.4374	0.0445	18.6863	13.5912	0.4452
0.9025	12.97	472	1.1124	9627	4155	2241	1247	18076	15872	13668	11464	53.2585	26.1782	16.396	10.8775	0.839	18076	21250	0.4531	0.2583	0.4386	0.4382	0.0436	18.7344	13.5259	0.4452
0.8402	13.99	509	1.1392	9425	4071	2176	1207	17339	15135	12931	10727	54.3572	26.8979	16.8278	11.252	0.7981	17339	21250	0.4495	0.2568	0.4365	0.4358	0.0445	18.3062	12.9129	0.4417
0.8282	14.98	545	1.1227	9803	4274	2316	1305	18652	16448	14244	12040	52.5574	25.9849	16.2595	10.8389	0.87	18652	21250	0.4573	0.2627	0.4418	0.4414	0.0463	19.2695	14.0104	0.4496
0.7694	16.0	582	1.1394	9740	4240	2299	1296	18281	16077	13873	11669	53.2794	26.3731	16.5718	11.1064	0.8501	18281	21250	0.4572	0.2629	0.4411	0.4412	0.0476	19.1704	13.6475	0.4492
0.7589	16.99	618	1.1497	9663	4140	2214	1232	18412	16208	14004	11800	52.4821	25.5429	15.8098	10.4407	0.8572	18412	21250	0.4515	0.2561	0.4359	0.4358	0.044	18.5906	13.7926	0.4432
0.724	17.98	654	1.1680	9743	4246	2316	1300	18402	16198	13994	11790	52.9453	26.2131	16.5499	11.0263	0.8566	18402	21250	0.4562	0.2625	0.4408	0.441	0.0472	19.2167	13.7214	0.4474
0.6755	18.99	691	1.1874	9722	4266	2351	1341	18272	16068	13864	11660	53.2071	26.5497	16.9576	11.5009	0.8496	18272	21250	0.4559	0.2639	0.4417	0.4413	0.0495	19.4647	13.6071	0.4469
0.657	19.79	720	1.1845	9920	4361	2402	1373	18884	16680	14476	12272	52.5312	26.1451	16.593	11.1881	0.8822	18884	21250	0.4594	0.2647	0.4423	0.4421	0.0467	19.8248	14.2001	0.4508

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-mt5-large-256