german-jeopardy-mt5-large-1k-64-constant

This model is a fine-tuned version of google/mt5-large on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 1.8162
Brevity Penalty: 0.9152
System Length: 19102
Reference Length: 20793
ROUGE-1: 41.68
ROUGE-2: 22.07
ROUGE-L: 40.20
ROUGE-Lsum: 40.19
Exact Match: 2.77
BLEU: 15.09
F1: 40.69

Model description

See google/mt5-large for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 7
gradient_accumulation_steps: 64
total_train_batch_size: 64
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	BLEU	Brevity Penalty	Counts 1	Counts 2	Counts 3	Counts 4	Exact Match	F1	Mean Generated Length	Validation Loss	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	System Length	Totals 1	Totals 2	Totals 3	Totals 4
2.732	1.0	145	12.4473	0.7805	7779	2893	1393	685	0.0168	0.3393	12.2523	1.2989	45.6809	19.5143	11.0372	6.5758	21250	0.3487	0.1796	0.3329	0.3327	17029	17029	14825	12621	10417
1.5514	2.0	291	14.7663	0.7871	8297	3336	1711	899	0.025	0.3743	12.441	1.2100	48.3931	22.3278	13.4333	8.5351	21250	0.3839	0.2089	0.3688	0.369	17145	17145	14941	12737	10533
1.3546	3.0	435	1.1428	8930	3713	1905	1022	17018	14814	12610	10406	52.4739	25.0641	15.1071	9.8213	0.7798	17018	21250	0.4225	0.2345	0.4075	0.4074	0.034	16.3903	12.6021	0.4155
1.1969	4.0	581	1.1113	9456	3994	2096	1157	18171	15967	13763	11559	52.039	25.0141	15.2292	10.0095	0.8441	18171	21250	0.4409	0.246	0.4251	0.4251	0.0386	17.8161	13.4061	0.4334
1.0876	5.0	726	1.1032	9606	4162	2233	1243	18179	15975	13771	11567	52.8412	26.0532	16.2152	10.7461	0.8446	18179	21250	0.4504	0.2571	0.4356	0.4357	0.0377	18.6911	13.5599	0.443
0.9881	6.0	872	1.1119	9608	4167	2235	1246	18245	16041	13837	11633	52.661	25.9772	16.1523	10.7109	0.8481	18245	21250	0.4505	0.2567	0.4348	0.4349	0.044	18.7071	13.6978	0.4429
0.9142	7.0	1017	1.1106	9757	4285	2311	1310	18291	16087	13883	11679	53.3432	26.6364	16.6463	11.2167	0.8506	18291	21250	0.4587	0.2641	0.4427	0.443	0.0495	19.3053	13.5826	0.451
0.8323	8.0	1163	1.1327	9757	4300	2341	1317	18293	16089	13885	11681	53.3373	26.7263	16.8599	11.2747	0.8507	18293	21250	0.4587	0.2662	0.4429	0.4426	0.0472	19.4102	13.6239	0.4513
0.7742	9.0	1308	1.1574	9757	4273	2324	1320	18273	16069	13865	11661	53.3957	26.5916	16.7616	11.3198	0.8497	18273	21250	0.4585	0.2653	0.4431	0.443	0.049	19.3574	13.5944	0.451
0.7101	10.0	1454	1.1674	9861	4403	2438	1416	18641	16437	14233	12029	52.8995	26.7871	17.1292	11.7716	0.8694	18641	21250	0.4594	0.2689	0.444	0.4435	0.0531	20.1003	13.9133	0.4525
0.6642	10.99	1599	1.1889	9868	4380	2358	1337	18386	16182	13978	11774	53.6713	27.0671	16.8694	11.3555	0.8558	18386	21250	0.4622	0.2694	0.4469	0.4466	0.0476	19.655	13.9142	0.4551
0.6067	12.0	1745	1.2207	9872	4384	2408	1395	18894	16690	14486	12282	52.2494	26.2672	16.6229	11.3581	0.8828	18894	21250	0.4569	0.2667	0.441	0.4408	0.0472	19.9169	14.2482	0.4489
0.5684	12.99	1890	1.2587	9870	4356	2360	1329	18901	16697	14493	12289	52.2195	26.0885	16.2837	10.8145	0.8831	18901	21250	0.4581	0.2651	0.4414	0.4409	0.0485	19.5451	14.2432	0.4506
0.5288	14.0	2036	1.2804	9815	4360	2389	1335	18367	16163	13959	11755	53.4382	26.9752	17.1144	11.3569	0.8547	18367	21250	0.4592	0.2671	0.4443	0.4436	0.0454	19.6648	13.7432	0.4504
0.4902	14.99	2181	1.3211	9886	4407	2398	1359	18777	16573	14369	12165	52.6495	26.5914	16.6887	11.1714	0.8766	18777	21250	0.4582	0.2674	0.4426	0.4421	0.0495	19.8138	14.1225	0.451
0.4498	16.0	2327	1.3621	10008	4477	2456	1381	19399	17195	14991	12787	51.5903	26.0366	16.3832	10.8	0.909	19399	21250	0.4569	0.2679	0.4415	0.4412	0.0476	20.0703	14.3725	0.4491
0.4216	16.99	2472	1.3967	10016	4483	2455	1385	19125	16921	14717	12513	52.3712	26.4937	16.6814	11.0685	0.8948	19125	21250	0.4615	0.2705	0.4457	0.4451	0.0481	20.1319	14.3008	0.4531
0.3829	18.0	2618	1.4460	9976	4407	2412	1374	19464	17260	15056	12852	51.2536	25.533	16.0202	10.6909	0.9123	19464	21250	0.4556	0.2627	0.4387	0.4385	0.0476	19.8508	14.7046	0.4479
0.3551	19.0	2764	1.4725	10010	4451	2438	1385	19131	16927	14723	12519	52.3235	26.2953	16.5591	11.0632	0.8952	19131	21250	0.4606	0.2672	0.4438	0.4434	0.0463	20.0572	14.3807	0.4523
0.3301	19.93	2900	1.5030	9858	4378	2406	1368	18872	16668	14464	12260	52.2361	26.2659	16.6344	11.1582	0.8816	18872	21250	0.4569	0.2644	0.4412	0.4405	0.0495	19.8047	14.2795	0.4483

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-mt5-large