german-jeopardy-mt5-base-256

This model is a fine-tuned version of google/mt5-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 1.51
Brevity Penalty: 0.8658
System Length: 18174
Reference Length: 20793
ROUGE-1: 38.80
ROUGE-2: 20.27
ROUGE-L: 37.34
ROUGE-Lsum: 37.32
Exact Match: 2.81
BLEU: 13.70
F1: 37.79

Model description

See google/mt5-base for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 7
gradient_accumulation_steps: 64
total_train_batch_size: 256
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Counts 1	Counts 2	Counts 3	Counts 4	Totals 1	Totals 2	Totals 3	Totals 4	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Brevity Penalty	System Length	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	Exact Match	BLEU	Mean Generated Length	F1
8.9608	0.99	36	2.8883	2306	50	12	2	17876	15672	13468	11264	12.9	0.319	0.0891	0.0178	0.828	17876	21250	0.0081	0.0022	0.0078	0.0078	0.0	0.2352	3.1969	0.0092
3.2364	1.98	72	1.9242	6125	1727	687	277	21152	18948	16744	14540	28.9571	9.1144	4.103	1.9051	0.9954	21152	21250	0.2457	0.1026	0.2345	0.2346	0.0018	6.7083	11.8072	0.2514
2.4963	3.0	109	1.6558	6903	2271	975	409	16537	14333	12129	9925	41.7428	15.8446	8.0386	4.1209	0.752	16537	21250	0.2966	0.1415	0.2854	0.2852	0.01	9.1493	12.176	0.2909
2.2314	3.98	145	1.5771	7160	2440	1098	501	16627	14423	12219	10015	43.0625	16.9174	8.986	5.0025	0.7573	16627	21250	0.314	0.1535	0.3028	0.3028	0.0136	10.187	12.157	0.3069
2.0578	4.97	181	1.5347	7447	2625	1214	566	17305	15101	12897	10693	43.0338	17.383	9.413	5.2932	0.7961	17305	21250	0.3286	0.1628	0.3146	0.3146	0.0163	11.0621	12.5585	0.32
1.8928	5.99	218	1.5128	7396	2659	1257	611	16598	14394	12190	9986	44.5596	18.473	10.3117	6.1186	0.7556	16598	21250	0.3326	0.1684	0.3198	0.3198	0.0177	11.4063	12.1692	0.3234
1.8573	6.98	254	1.4736	7531	2758	1313	641	16728	14524	12320	10116	45.0203	18.9893	10.6575	6.3365	0.7631	16728	21250	0.3349	0.1717	0.3216	0.3216	0.0163	11.8292	12.3035	0.327
1.7361	8.0	291	1.4544	7658	2849	1368	668	16928	14724	12520	10316	45.2387	19.3494	10.9265	6.4754	0.7747	16928	21250	0.3414	0.1762	0.3283	0.3284	0.0181	12.2208	12.4628	0.3334
1.7162	8.99	327	1.4459	7703	2891	1390	694	16795	14591	12387	10183	45.8648	19.8136	11.2214	6.8153	0.767	16795	21250	0.3454	0.1785	0.3325	0.3323	0.0159	12.4536	12.4174	0.3374
1.6589	9.98	363	1.4383	7889	2983	1449	719	17376	15172	12968	10764	45.4017	19.6612	11.1737	6.6797	0.8002	17376	21250	0.3519	0.1816	0.3375	0.3372	0.0172	12.8553	12.7101	0.3435
1.5571	10.99	400	1.4214	7889	2994	1457	736	17185	14981	12777	10573	45.9063	19.9853	11.4033	6.9611	0.7894	17185	21250	0.3529	0.1845	0.3392	0.3393	0.02	12.9671	12.6466	0.3457
1.5502	11.98	436	1.4135	7930	3008	1477	741	16868	14664	12460	10256	47.0121	20.5128	11.8539	7.225	0.7712	16868	21250	0.3619	0.189	0.3492	0.3491	0.0213	13.0741	12.4483	0.3541
1.4564	13.0	473	1.3943	8268	3200	1616	837	17929	15725	13521	11317	46.1152	20.3498	11.9518	7.396	0.8309	17929	21250	0.3729	0.1974	0.3578	0.3576	0.0218	14.1014	13.2441	0.3647
1.4522	13.99	509	1.3953	8047	3130	1564	811	16789	14585	12381	10177	47.9302	21.4604	12.6323	7.9689	0.7667	16789	21250	0.3712	0.197	0.3582	0.3581	0.0227	13.7526	12.515	0.3627
1.407	14.98	545	1.3759	8498	3358	1703	877	17923	15719	13515	11311	47.4139	21.3627	12.6008	7.7535	0.8306	17923	21250	0.3856	0.2063	0.3709	0.3706	0.0213	14.7315	13.2849	0.3772
1.3294	15.99	582	1.3776	8481	3407	1721	883	17451	15247	13043	10839	48.5989	22.3454	13.1948	8.1465	0.8044	17451	21250	0.3907	0.211	0.3766	0.3766	0.024	14.868	12.9142	0.3822
1.3294	16.98	618	1.3803	8633	3464	1767	923	18004	15800	13596	11392	47.9505	21.9241	12.9965	8.1022	0.835	18004	21250	0.3946	0.2133	0.3801	0.3798	0.0263	15.2312	13.3103	0.3868
1.2605	18.0	655	1.3710	8560	3376	1695	880	17830	15626	13422	11218	48.009	21.605	12.6285	7.8445	0.8255	17830	21250	0.3922	0.2092	0.3778	0.3775	0.0231	14.779	13.1665	0.3846
1.2667	18.99	691	1.3694	8664	3455	1733	882	17834	15630	13426	11222	48.5814	22.1049	12.9078	7.8596	0.8257	17834	21250	0.3987	0.2138	0.3853	0.3851	0.0227	15.0008	13.2232	0.3906
1.2074	19.79	720	1.3658	8770	3465	1737	880	18039	15835	13631	11427	48.6169	21.8819	12.743	7.7011	0.8369	18039	21250	0.4025	0.215	0.3883	0.3879	0.0227	15.0442	13.4424	0.3941

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-mt5-base-256