Marvin

Initial commit

8f02e9b unverified about 1 year ago

10.3 kB

	---
	language:
	- de
	tags:
	- question-generation
	- german
	- text2text-generation
	- generated_from_trainer
	datasets:
	- lmqg/qg_dequad
	metrics:
	- bleu4
	- f1
	- rouge
	- exact_match
	model-index:
	- name: german-jeopardy-mt5-large
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: lmqg/qg_dequad
	type: default
	args: default
	metrics:
	- name: BLEU-4
	type: bleu4
	value: 15.09
	- name: F1
	type: f1
	value: 40.69
	- name: ROUGE-1
	type: rouge1
	value: 41.68
	- name: ROUGE-2
	type: rouge2
	value: 22.07
	- name: ROUGE-L
	type: rougel
	value: 40.20
	- name: ROUGE-Lsum
	type: rougelsum
	value: 40.19
	- name: Exact Match
	type: exact_match
	value: 2.77
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# german-jeopardy-mt5-large-1k-64-constant

	This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on the [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.8162
	- Brevity Penalty: 0.9152
	- System Length: 19102
	- Reference Length: 20793
	- ROUGE-1: 41.68
	- ROUGE-2: 22.07
	- ROUGE-L: 40.20
	- ROUGE-Lsum: 40.19
	- Exact Match: 2.77
	- BLEU: 15.09
	- F1: 40.69

	## Model description


	See [google/mt5-large](https://huggingface.co/google/mt5-large) for the model architecture.
	The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

	## Intended uses & limitations

	This model can be used for question generation on German text.

	## Training and evaluation data

	See [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad).

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 7
	- gradient_accumulation_steps: 64
	- total_train_batch_size: 64
	- optimizer: Adafactor
	- lr_scheduler_type: constant
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| BLEU \| Brevity Penalty \| Counts 1 \| Counts 2 \| Counts 3 \| Counts 4 \| Exact Match \| F1 \| Mean Generated Length \| Validation Loss \| Precisions 1 \| Precisions 2 \| Precisions 3 \| Precisions 4 \| Reference Length \| ROUGE-1 \| ROUGE-2 \| ROUGE-L \| ROUGE-Lsum \| System Length \| Totals 1 \| Totals 2 \| Totals 3 \| Totals 4 \|
	\|:-------------:\|:-----:\|:----:\|:-------:\|:---------------:\|:--------:\|:--------:\|:--------:\|:--------:\|:-----------:\|:------:\|:---------------------:\|:---------------:\|:------------:\|:------------:\|:------------:\|:------------:\|:----------------:\|:-------:\|:-------:\|:-------:\|:----------:\|:-------------:\|:--------:\|:--------:\|:--------:\|:--------:\|
	\| 2.732 \| 1.0 \| 145 \| 12.4473 \| 0.7805 \| 7779 \| 2893 \| 1393 \| 685 \| 0.0168 \| 0.3393 \| 12.2523 \| 1.2989 \| 45.6809 \| 19.5143 \| 11.0372 \| 6.5758 \| 21250 \| 0.3487 \| 0.1796 \| 0.3329 \| 0.3327 \| 17029 \| 17029 \| 14825 \| 12621 \| 10417 \|
	\| 1.5514 \| 2.0 \| 291 \| 14.7663 \| 0.7871 \| 8297 \| 3336 \| 1711 \| 899 \| 0.025 \| 0.3743 \| 12.441 \| 1.2100 \| 48.3931 \| 22.3278 \| 13.4333 \| 8.5351 \| 21250 \| 0.3839 \| 0.2089 \| 0.3688 \| 0.369 \| 17145 \| 17145 \| 14941 \| 12737 \| 10533 \|
	\| 1.3546 \| 3.0 \| 435 \| 1.1428 \| 8930 \| 3713 \| 1905 \| 1022 \| 17018 \| 14814 \| 12610 \| 10406 \| 52.4739 \| 25.0641 \| 15.1071 \| 9.8213 \| 0.7798 \| 17018 \| 21250 \| 0.4225 \| 0.2345 \| 0.4075 \| 0.4074 \| 0.034 \| 16.3903 \| 12.6021 \| 0.4155 \|
	\| 1.1969 \| 4.0 \| 581 \| 1.1113 \| 9456 \| 3994 \| 2096 \| 1157 \| 18171 \| 15967 \| 13763 \| 11559 \| 52.039 \| 25.0141 \| 15.2292 \| 10.0095 \| 0.8441 \| 18171 \| 21250 \| 0.4409 \| 0.246 \| 0.4251 \| 0.4251 \| 0.0386 \| 17.8161 \| 13.4061 \| 0.4334 \|
	\| 1.0876 \| 5.0 \| 726 \| 1.1032 \| 9606 \| 4162 \| 2233 \| 1243 \| 18179 \| 15975 \| 13771 \| 11567 \| 52.8412 \| 26.0532 \| 16.2152 \| 10.7461 \| 0.8446 \| 18179 \| 21250 \| 0.4504 \| 0.2571 \| 0.4356 \| 0.4357 \| 0.0377 \| 18.6911 \| 13.5599 \| 0.443 \|
	\| 0.9881 \| 6.0 \| 872 \| 1.1119 \| 9608 \| 4167 \| 2235 \| 1246 \| 18245 \| 16041 \| 13837 \| 11633 \| 52.661 \| 25.9772 \| 16.1523 \| 10.7109 \| 0.8481 \| 18245 \| 21250 \| 0.4505 \| 0.2567 \| 0.4348 \| 0.4349 \| 0.044 \| 18.7071 \| 13.6978 \| 0.4429 \|
	\| 0.9142 \| 7.0 \| 1017 \| 1.1106 \| 9757 \| 4285 \| 2311 \| 1310 \| 18291 \| 16087 \| 13883 \| 11679 \| 53.3432 \| 26.6364 \| 16.6463 \| 11.2167 \| 0.8506 \| 18291 \| 21250 \| 0.4587 \| 0.2641 \| 0.4427 \| 0.443 \| 0.0495 \| 19.3053 \| 13.5826 \| 0.451 \|
	\| 0.8323 \| 8.0 \| 1163 \| 1.1327 \| 9757 \| 4300 \| 2341 \| 1317 \| 18293 \| 16089 \| 13885 \| 11681 \| 53.3373 \| 26.7263 \| 16.8599 \| 11.2747 \| 0.8507 \| 18293 \| 21250 \| 0.4587 \| 0.2662 \| 0.4429 \| 0.4426 \| 0.0472 \| 19.4102 \| 13.6239 \| 0.4513 \|
	\| 0.7742 \| 9.0 \| 1308 \| 1.1574 \| 9757 \| 4273 \| 2324 \| 1320 \| 18273 \| 16069 \| 13865 \| 11661 \| 53.3957 \| 26.5916 \| 16.7616 \| 11.3198 \| 0.8497 \| 18273 \| 21250 \| 0.4585 \| 0.2653 \| 0.4431 \| 0.443 \| 0.049 \| 19.3574 \| 13.5944 \| 0.451 \|
	\| 0.7101 \| 10.0 \| 1454 \| 1.1674 \| 9861 \| 4403 \| 2438 \| 1416 \| 18641 \| 16437 \| 14233 \| 12029 \| 52.8995 \| 26.7871 \| 17.1292 \| 11.7716 \| 0.8694 \| 18641 \| 21250 \| 0.4594 \| 0.2689 \| 0.444 \| 0.4435 \| 0.0531 \| 20.1003 \| 13.9133 \| 0.4525 \|
	\| 0.6642 \| 10.99 \| 1599 \| 1.1889 \| 9868 \| 4380 \| 2358 \| 1337 \| 18386 \| 16182 \| 13978 \| 11774 \| 53.6713 \| 27.0671 \| 16.8694 \| 11.3555 \| 0.8558 \| 18386 \| 21250 \| 0.4622 \| 0.2694 \| 0.4469 \| 0.4466 \| 0.0476 \| 19.655 \| 13.9142 \| 0.4551 \|
	\| 0.6067 \| 12.0 \| 1745 \| 1.2207 \| 9872 \| 4384 \| 2408 \| 1395 \| 18894 \| 16690 \| 14486 \| 12282 \| 52.2494 \| 26.2672 \| 16.6229 \| 11.3581 \| 0.8828 \| 18894 \| 21250 \| 0.4569 \| 0.2667 \| 0.441 \| 0.4408 \| 0.0472 \| 19.9169 \| 14.2482 \| 0.4489 \|
	\| 0.5684 \| 12.99 \| 1890 \| 1.2587 \| 9870 \| 4356 \| 2360 \| 1329 \| 18901 \| 16697 \| 14493 \| 12289 \| 52.2195 \| 26.0885 \| 16.2837 \| 10.8145 \| 0.8831 \| 18901 \| 21250 \| 0.4581 \| 0.2651 \| 0.4414 \| 0.4409 \| 0.0485 \| 19.5451 \| 14.2432 \| 0.4506 \|
	\| 0.5288 \| 14.0 \| 2036 \| 1.2804 \| 9815 \| 4360 \| 2389 \| 1335 \| 18367 \| 16163 \| 13959 \| 11755 \| 53.4382 \| 26.9752 \| 17.1144 \| 11.3569 \| 0.8547 \| 18367 \| 21250 \| 0.4592 \| 0.2671 \| 0.4443 \| 0.4436 \| 0.0454 \| 19.6648 \| 13.7432 \| 0.4504 \|
	\| 0.4902 \| 14.99 \| 2181 \| 1.3211 \| 9886 \| 4407 \| 2398 \| 1359 \| 18777 \| 16573 \| 14369 \| 12165 \| 52.6495 \| 26.5914 \| 16.6887 \| 11.1714 \| 0.8766 \| 18777 \| 21250 \| 0.4582 \| 0.2674 \| 0.4426 \| 0.4421 \| 0.0495 \| 19.8138 \| 14.1225 \| 0.451 \|
	\| 0.4498 \| 16.0 \| 2327 \| 1.3621 \| 10008 \| 4477 \| 2456 \| 1381 \| 19399 \| 17195 \| 14991 \| 12787 \| 51.5903 \| 26.0366 \| 16.3832 \| 10.8 \| 0.909 \| 19399 \| 21250 \| 0.4569 \| 0.2679 \| 0.4415 \| 0.4412 \| 0.0476 \| 20.0703 \| 14.3725 \| 0.4491 \|
	\| 0.4216 \| 16.99 \| 2472 \| 1.3967 \| 10016 \| 4483 \| 2455 \| 1385 \| 19125 \| 16921 \| 14717 \| 12513 \| 52.3712 \| 26.4937 \| 16.6814 \| 11.0685 \| 0.8948 \| 19125 \| 21250 \| 0.4615 \| 0.2705 \| 0.4457 \| 0.4451 \| 0.0481 \| 20.1319 \| 14.3008 \| 0.4531 \|
	\| 0.3829 \| 18.0 \| 2618 \| 1.4460 \| 9976 \| 4407 \| 2412 \| 1374 \| 19464 \| 17260 \| 15056 \| 12852 \| 51.2536 \| 25.533 \| 16.0202 \| 10.6909 \| 0.9123 \| 19464 \| 21250 \| 0.4556 \| 0.2627 \| 0.4387 \| 0.4385 \| 0.0476 \| 19.8508 \| 14.7046 \| 0.4479 \|
	\| 0.3551 \| 19.0 \| 2764 \| 1.4725 \| 10010 \| 4451 \| 2438 \| 1385 \| 19131 \| 16927 \| 14723 \| 12519 \| 52.3235 \| 26.2953 \| 16.5591 \| 11.0632 \| 0.8952 \| 19131 \| 21250 \| 0.4606 \| 0.2672 \| 0.4438 \| 0.4434 \| 0.0463 \| 20.0572 \| 14.3807 \| 0.4523 \|
	\| 0.3301 \| 19.93 \| 2900 \| 1.5030 \| 9858 \| 4378 \| 2406 \| 1368 \| 18872 \| 16668 \| 14464 \| 12260 \| 52.2361 \| 26.2659 \| 16.6344 \| 11.1582 \| 0.8816 \| 18872 \| 21250 \| 0.4569 \| 0.2644 \| 0.4412 \| 0.4405 \| 0.0495 \| 19.8047 \| 14.2795 \| 0.4483 \|


	### Framework versions

	- Transformers 4.32.1
	- Pytorch 2.1.0
	- Datasets 2.12.0
	- Tokenizers 0.13.3