german-jeopardy-mt5-base-128

This model is a fine-tuned version of google/mt5-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 1.56
Brevity Penalty: 0.8709
System Length: 18267
Reference Length: 20793
ROUGE-1: 40.45
ROUGE-2: 21.49
ROUGE-L: 39.02
ROUGE-Lsum: 39.01
Exact Match: 2.68
BLEU: 14.62
F1: 39.47

Model description

See google/mt5-base for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 7
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Counts 1	Counts 2	Counts 3	Counts 4	Totals 1	Totals 2	Totals 3	Totals 4	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Brevity Penalty	System Length	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	Exact Match	BLEU	Mean Generated Length	F1
6.6905	0.99	72	2.0972	5515	1394	522	191	28172	25968	23764	21560	19.5762	5.3681	2.1966	0.8859	1.0	28172	21250	0.1942	0.0761	0.1837	0.1841	0.0	3.7816	11.2786	0.2106
2.4978	1.99	145	1.6211	7079	2339	1027	446	16544	14340	12136	9932	42.7889	16.311	8.4624	4.4905	0.7524	16544	21250	0.3097	0.1455	0.2971	0.2969	0.01	9.6021	12.0159	0.3032
2.1021	3.0	218	1.5342	7507	2637	1222	575	17211	15007	12803	10599	43.6175	17.5718	9.5446	5.425	0.7908	17211	21250	0.3304	0.1642	0.3172	0.3171	0.0141	11.162	12.6375	0.3228
1.9208	4.0	291	1.4862	7599	2755	1296	620	16871	14667	12463	10259	45.0418	18.7837	10.3988	6.0435	0.7714	16871	21250	0.3377	0.1721	0.3232	0.3229	0.015	11.7136	12.3938	0.33
1.8135	4.99	363	1.4626	7831	2955	1424	694	17184	14980	12776	10572	45.5715	19.7263	11.1459	6.5645	0.7893	17184	21250	0.3497	0.1837	0.3358	0.3354	0.0177	12.6402	12.6366	0.3417
1.6907	5.99	436	1.4392	7872	3023	1482	740	16907	14703	12499	10295	46.5606	20.5604	11.8569	7.188	0.7735	16907	21250	0.3566	0.1896	0.3432	0.343	0.0177	13.0722	12.564	0.3483
1.6159	6.99	509	1.4288	7981	3128	1542	773	17016	14812	12608	10404	46.9029	21.118	12.2303	7.4298	0.7797	17016	21250	0.363	0.1952	0.3504	0.3502	0.0191	13.5053	12.5749	0.3543
1.556	8.0	582	1.4132	8014	3046	1496	748	17320	15116	12912	10708	46.2702	20.1508	11.5861	6.9854	0.797	17320	21250	0.3632	0.1903	0.3489	0.3491	0.0222	13.2095	12.7641	0.355
1.4951	9.0	655	1.3926	8342	3271	1622	819	17178	14974	12770	10566	48.5621	21.8445	12.7016	7.7513	0.789	17178	21250	0.3843	0.2059	0.3704	0.3704	0.0218	14.1831	12.7654	0.3769
1.4522	9.99	727	1.3769	8639	3449	1740	891	17708	15504	13300	11096	48.7859	22.2459	13.0827	8.0299	0.8187	17708	21250	0.3972	0.2129	0.3821	0.3823	0.024	15.0442	13.1016	0.3895
1.3663	10.99	800	1.3677	8736	3468	1747	924	17674	15470	13266	11062	49.4285	22.4176	13.169	8.3529	0.8168	17674	21250	0.4027	0.215	0.3871	0.387	0.0245	15.2622	13.0399	0.3946
1.3122	11.99	873	1.3521	8833	3533	1780	915	17927	15723	13519	11315	49.272	22.4703	13.1667	8.0866	0.8308	17927	21250	0.4055	0.219	0.3915	0.3915	0.0222	15.3943	13.3494	0.3975
1.2641	13.0	946	1.3494	9048	3668	1864	989	18242	16038	13834	11630	49.5998	22.8707	13.474	8.5039	0.848	18242	21250	0.4165	0.2265	0.4011	0.401	0.0268	16.1011	13.5508	0.408
1.2359	13.99	1018	1.3488	9075	3709	1907	1013	18098	15894	13690	11486	50.1437	23.3359	13.9299	8.8194	0.8402	18098	21250	0.4195	0.2298	0.4041	0.4038	0.0259	16.3595	13.5681	0.4113
1.1754	14.99	1091	1.3482	9182	3777	1957	1048	18366	16162	13958	11754	49.9946	23.3696	14.0206	8.9161	0.8547	18366	21250	0.4227	0.2314	0.406	0.4058	0.0268	16.7083	13.6534	0.4145
1.1367	15.99	1164	1.3501	9164	3761	1935	1033	18310	16106	13902	11698	50.0492	23.3515	13.9189	8.8306	0.8517	18310	21250	0.4225	0.2316	0.4078	0.4079	0.0245	16.5803	13.6152	0.4147
1.096	17.0	1237	1.3586	9126	3712	1922	1050	18277	16073	13869	11665	49.9316	23.0946	13.8582	9.0013	0.8499	18277	21250	0.4217	0.2304	0.4066	0.4066	0.0295	16.5513	13.6325	0.4141
1.0571	18.0	1310	1.3658	9087	3707	1923	1033	18179	15975	13771	11567	49.9862	23.205	13.9641	8.9306	0.8446	18179	21250	0.4196	0.2301	0.4049	0.4049	0.029	16.4708	13.5172	0.4116
1.036	18.99	1382	1.3672	9206	3806	1976	1059	18332	16128	13924	11720	50.2182	23.5987	14.1913	9.0358	0.8528	18332	21250	0.4254	0.2348	0.4106	0.4107	0.0309	16.8386	13.7205	0.4174
0.9785	19.79	1440	1.3819	9180	3796	1973	1059	18164	15960	13756	11552	50.5395	23.7845	14.3428	9.1672	0.8438	18164	21250	0.4254	0.2344	0.4116	0.4117	0.0327	16.8234	13.5113	0.4172

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-mt5-base-128