german-jeopardy-longt5-large-1k-64-constant

This model is a fine-tuned version of google/long-t5-tglobal-large on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

Loss: 2.5907
Brevity Penalty: 0.9367
System Length: 19517
Reference Length: 20793
ROUGE-1: 32.79
ROUGE-2: 14.95
ROUGE-L: 31.56
ROUGE-Lsum: 31.57
Exact Match: 1.36
BLEU: 9.50
F1: 32.03

Model description

See google/long-t5-tglobal-large for more information about the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 7
gradient_accumulation_steps: 32
total_train_batch_size: 64
optimizer: Adafactor
lr_scheduler_type: constant
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Counts 1	Counts 2	Counts 3	Counts 4	Totals 1	Totals 2	Totals 3	Totals 4	Precisions 1	Precisions 2	Precisions 3	Precisions 4	Brevity Penalty	System Length	Reference Length	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum	Exact Match	BLEU	Mean Generated Length	F1
6.5987	1.0	145	5.0696	3804	134	2	0	22913	20709	18505	16301	16.6019	0.6471	0.0108	0.0031	1.0	22913	21250	0.0783	0.007	0.0769	0.0768	0.0	0.1374	16.2899	0.0814
4.7443	2.0	291	4.2270	4022	188	20	0	17366	15162	12958	10754	23.1602	1.2399	0.1543	0.0046	0.7996	17366	21250	0.1028	0.012	0.0991	0.099	0.0	0.303	12.9038	0.1073
4.1412	3.0	436	3.7838	3723	187	26	2	16515	14311	12107	9903	22.5431	1.3067	0.2148	0.0202	0.7507	16515	21250	0.0899	0.0124	0.0886	0.0884	0.0	0.4488	12.4769	0.0938
3.6791	4.0	582	3.4246	4576	549	134	26	21871	19667	17463	15259	20.9227	2.7915	0.7673	0.1704	1.0	21871	21250	0.1259	0.0296	0.1204	0.1201	0.0	1.6623	14.5676	0.1323
3.3523	5.0	727	3.1723	4900	796	210	41	19389	17185	14981	12777	25.2721	4.6319	1.4018	0.3209	0.9085	19389	21250	0.1542	0.0449	0.1486	0.1484	0.0005	2.4472	14.3943	0.1585
3.0161	6.0	873	2.9268	5633	1182	390	111	19045	16841	14637	12433	29.5773	7.0186	2.6645	0.8928	0.8907	19045	21250	0.204	0.069	0.196	0.1961	0.0045	4.1987	14.5789	0.2074
2.7639	7.0	1018	2.7601	6100	1461	499	165	17924	15720	13516	11312	34.0326	9.2939	3.6919	1.4586	0.8306	17924	21250	0.2409	0.0885	0.2332	0.2331	0.0073	5.3362	13.8553	0.2431
2.5036	8.0	1164	2.5729	6765	1845	701	273	20179	17975	15771	13567	33.525	10.2643	4.4449	2.0122	0.9483	20179	21250	0.2682	0.1079	0.2589	0.259	0.0059	7.0633	15.7232	0.2689
2.307	8.99	1309	2.4637	7018	2047	826	348	19054	16850	14646	12442	36.8322	12.1484	5.6398	2.797	0.8911	19054	21250	0.2907	0.1218	0.2799	0.2798	0.0095	8.1681	14.8076	0.2907
2.1012	10.0	1455	2.3614	7147	2127	883	389	18473	16269	14065	11861	38.6889	13.0739	6.278	3.2797	0.8604	18473	21250	0.3003	0.1275	0.289	0.2888	0.0118	8.6921	14.2736	0.3008
1.9538	10.99	1600	2.2980	7481	2339	997	459	18524	16320	14116	11912	40.3854	14.3321	7.0629	3.8533	0.8632	18524	21250	0.3192	0.1423	0.3064	0.3068	0.0127	9.67	14.3757	0.3167
1.7909	12.0	1746	2.2389	7675	2546	1144	546	18849	16645	14441	12237	40.7183	15.2959	7.9219	4.4619	0.8804	18849	21250	0.3299	0.1528	0.3174	0.3175	0.015	10.724	14.583	0.3279
1.6691	12.99	1891	2.1813	7858	2635	1179	576	18643	16439	14235	12031	42.1499	16.029	8.2824	4.7876	0.8695	18643	21250	0.344	0.1626	0.33	0.33	0.0163	11.1241	14.3848	0.3395
1.5361	14.0	2037	2.1546	8016	2729	1249	606	18754	16550	14346	12142	42.7429	16.4894	8.7063	4.9909	0.8754	18754	21250	0.3494	0.1664	0.3349	0.3351	0.0163	11.5803	14.564	0.3462
1.4365	14.99	2182	2.1358	8112	2839	1316	647	18390	16186	13982	11778	44.1109	17.5398	9.4121	5.4933	0.856	18390	21250	0.3581	0.1761	0.3448	0.3448	0.02	12.1055	14.1656	0.3538
1.3263	16.0	2328	2.1190	8381	2990	1430	731	18892	16688	14484	12280	44.3627	17.9171	9.873	5.9528	0.8827	18892	21250	0.3681	0.1831	0.3532	0.3534	0.0209	12.9765	14.5445	0.363
1.2329	17.0	2474	2.1202	8449	3101	1520	786	18612	16408	14204	12000	45.3954	18.8993	10.7012	6.55	0.8678	18612	21250	0.3743	0.1901	0.3603	0.3603	0.0227	13.5903	14.1779	0.3692
1.1557	18.0	2619	2.1282	8406	3154	1558	804	17958	15754	13550	11346	46.8092	20.0203	11.4982	7.0862	0.8325	17958	21250	0.3761	0.194	0.3633	0.3636	0.0277	13.8388	13.677	0.371
1.0658	19.0	2765	2.1232	8614	3241	1610	839	18955	16751	14547	12343	45.4445	19.3481	11.0676	6.7974	0.886	18955	21250	0.3803	0.196	0.3654	0.3656	0.0272	14.2084	14.3816	0.3749
0.9944	19.93	2900	2.1203	8658	3273	1625	859	18853	16649	14445	12241	45.9237	19.6588	11.2496	7.0174	0.8806	18853	21250	0.3833	0.1977	0.369	0.3691	0.0268	14.3883	14.2881	0.3775

Framework versions

Transformers 4.32.1
Pytorch 2.1.0
Datasets 2.12.0
Tokenizers 0.13.3

GiantTreeG
/

german-jeopardy-longt5-large