|
--- |
|
language: |
|
- de |
|
tags: |
|
- question-generation |
|
- german |
|
- text2text-generation |
|
- generated_from_trainer |
|
datasets: |
|
- lmqg/qg_dequad |
|
metrics: |
|
- bleu4 |
|
- f1 |
|
- rouge |
|
- exact_match |
|
model-index: |
|
- name: german-jeopardy-mt5-large |
|
results: |
|
- task: |
|
name: Sequence-to-sequence Language Modeling |
|
type: text2text-generation |
|
dataset: |
|
name: lmqg/qg_dequad |
|
type: default |
|
args: default |
|
metrics: |
|
- name: BLEU-4 |
|
type: bleu4 |
|
value: 15.09 |
|
- name: F1 |
|
type: f1 |
|
value: 40.69 |
|
- name: ROUGE-1 |
|
type: rouge1 |
|
value: 41.68 |
|
- name: ROUGE-2 |
|
type: rouge2 |
|
value: 22.07 |
|
- name: ROUGE-L |
|
type: rougel |
|
value: 40.20 |
|
- name: ROUGE-Lsum |
|
type: rougelsum |
|
value: 40.19 |
|
- name: Exact Match |
|
type: exact_match |
|
value: 2.77 |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# german-jeopardy-mt5-large-1k-64-constant |
|
|
|
This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on the [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad) dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.8162 |
|
- Brevity Penalty: 0.9152 |
|
- System Length: 19102 |
|
- Reference Length: 20793 |
|
- ROUGE-1: 41.68 |
|
- ROUGE-2: 22.07 |
|
- ROUGE-L: 40.20 |
|
- ROUGE-Lsum: 40.19 |
|
- Exact Match: 2.77 |
|
- BLEU: 15.09 |
|
- F1: 40.69 |
|
|
|
## Model description |
|
|
|
|
|
See [google/mt5-large](https://huggingface.co/google/mt5-large) for the model architecture. |
|
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM. |
|
|
|
## Intended uses & limitations |
|
|
|
This model can be used for question generation on German text. |
|
|
|
## Training and evaluation data |
|
|
|
See [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad). |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 7 |
|
- gradient_accumulation_steps: 64 |
|
- total_train_batch_size: 64 |
|
- optimizer: Adafactor |
|
- lr_scheduler_type: constant |
|
- num_epochs: 20 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | BLEU | Brevity Penalty | Counts 1 | Counts 2 | Counts 3 | Counts 4 | Exact Match | F1 | Mean Generated Length | Validation Loss | Precisions 1 | Precisions 2 | Precisions 3 | Precisions 4 | Reference Length | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | System Length | Totals 1 | Totals 2 | Totals 3 | Totals 4 | |
|
|:-------------:|:-----:|:----:|:-------:|:---------------:|:--------:|:--------:|:--------:|:--------:|:-----------:|:------:|:---------------------:|:---------------:|:------------:|:------------:|:------------:|:------------:|:----------------:|:-------:|:-------:|:-------:|:----------:|:-------------:|:--------:|:--------:|:--------:|:--------:| |
|
| 2.732 | 1.0 | 145 | 12.4473 | 0.7805 | 7779 | 2893 | 1393 | 685 | 0.0168 | 0.3393 | 12.2523 | 1.2989 | 45.6809 | 19.5143 | 11.0372 | 6.5758 | 21250 | 0.3487 | 0.1796 | 0.3329 | 0.3327 | 17029 | 17029 | 14825 | 12621 | 10417 | |
|
| 1.5514 | 2.0 | 291 | 14.7663 | 0.7871 | 8297 | 3336 | 1711 | 899 | 0.025 | 0.3743 | 12.441 | 1.2100 | 48.3931 | 22.3278 | 13.4333 | 8.5351 | 21250 | 0.3839 | 0.2089 | 0.3688 | 0.369 | 17145 | 17145 | 14941 | 12737 | 10533 | |
|
| 1.3546 | 3.0 | 435 | 1.1428 | 8930 | 3713 | 1905 | 1022 | 17018 | 14814 | 12610 | 10406 | 52.4739 | 25.0641 | 15.1071 | 9.8213 | 0.7798 | 17018 | 21250 | 0.4225 | 0.2345 | 0.4075 | 0.4074 | 0.034 | 16.3903 | 12.6021 | 0.4155 | |
|
| 1.1969 | 4.0 | 581 | 1.1113 | 9456 | 3994 | 2096 | 1157 | 18171 | 15967 | 13763 | 11559 | 52.039 | 25.0141 | 15.2292 | 10.0095 | 0.8441 | 18171 | 21250 | 0.4409 | 0.246 | 0.4251 | 0.4251 | 0.0386 | 17.8161 | 13.4061 | 0.4334 | |
|
| 1.0876 | 5.0 | 726 | 1.1032 | 9606 | 4162 | 2233 | 1243 | 18179 | 15975 | 13771 | 11567 | 52.8412 | 26.0532 | 16.2152 | 10.7461 | 0.8446 | 18179 | 21250 | 0.4504 | 0.2571 | 0.4356 | 0.4357 | 0.0377 | 18.6911 | 13.5599 | 0.443 | |
|
| 0.9881 | 6.0 | 872 | 1.1119 | 9608 | 4167 | 2235 | 1246 | 18245 | 16041 | 13837 | 11633 | 52.661 | 25.9772 | 16.1523 | 10.7109 | 0.8481 | 18245 | 21250 | 0.4505 | 0.2567 | 0.4348 | 0.4349 | 0.044 | 18.7071 | 13.6978 | 0.4429 | |
|
| 0.9142 | 7.0 | 1017 | 1.1106 | 9757 | 4285 | 2311 | 1310 | 18291 | 16087 | 13883 | 11679 | 53.3432 | 26.6364 | 16.6463 | 11.2167 | 0.8506 | 18291 | 21250 | 0.4587 | 0.2641 | 0.4427 | 0.443 | 0.0495 | 19.3053 | 13.5826 | 0.451 | |
|
| 0.8323 | 8.0 | 1163 | 1.1327 | 9757 | 4300 | 2341 | 1317 | 18293 | 16089 | 13885 | 11681 | 53.3373 | 26.7263 | 16.8599 | 11.2747 | 0.8507 | 18293 | 21250 | 0.4587 | 0.2662 | 0.4429 | 0.4426 | 0.0472 | 19.4102 | 13.6239 | 0.4513 | |
|
| 0.7742 | 9.0 | 1308 | 1.1574 | 9757 | 4273 | 2324 | 1320 | 18273 | 16069 | 13865 | 11661 | 53.3957 | 26.5916 | 16.7616 | 11.3198 | 0.8497 | 18273 | 21250 | 0.4585 | 0.2653 | 0.4431 | 0.443 | 0.049 | 19.3574 | 13.5944 | 0.451 | |
|
| 0.7101 | 10.0 | 1454 | 1.1674 | 9861 | 4403 | 2438 | 1416 | 18641 | 16437 | 14233 | 12029 | 52.8995 | 26.7871 | 17.1292 | 11.7716 | 0.8694 | 18641 | 21250 | 0.4594 | 0.2689 | 0.444 | 0.4435 | 0.0531 | 20.1003 | 13.9133 | 0.4525 | |
|
| 0.6642 | 10.99 | 1599 | 1.1889 | 9868 | 4380 | 2358 | 1337 | 18386 | 16182 | 13978 | 11774 | 53.6713 | 27.0671 | 16.8694 | 11.3555 | 0.8558 | 18386 | 21250 | 0.4622 | 0.2694 | 0.4469 | 0.4466 | 0.0476 | 19.655 | 13.9142 | 0.4551 | |
|
| 0.6067 | 12.0 | 1745 | 1.2207 | 9872 | 4384 | 2408 | 1395 | 18894 | 16690 | 14486 | 12282 | 52.2494 | 26.2672 | 16.6229 | 11.3581 | 0.8828 | 18894 | 21250 | 0.4569 | 0.2667 | 0.441 | 0.4408 | 0.0472 | 19.9169 | 14.2482 | 0.4489 | |
|
| 0.5684 | 12.99 | 1890 | 1.2587 | 9870 | 4356 | 2360 | 1329 | 18901 | 16697 | 14493 | 12289 | 52.2195 | 26.0885 | 16.2837 | 10.8145 | 0.8831 | 18901 | 21250 | 0.4581 | 0.2651 | 0.4414 | 0.4409 | 0.0485 | 19.5451 | 14.2432 | 0.4506 | |
|
| 0.5288 | 14.0 | 2036 | 1.2804 | 9815 | 4360 | 2389 | 1335 | 18367 | 16163 | 13959 | 11755 | 53.4382 | 26.9752 | 17.1144 | 11.3569 | 0.8547 | 18367 | 21250 | 0.4592 | 0.2671 | 0.4443 | 0.4436 | 0.0454 | 19.6648 | 13.7432 | 0.4504 | |
|
| 0.4902 | 14.99 | 2181 | 1.3211 | 9886 | 4407 | 2398 | 1359 | 18777 | 16573 | 14369 | 12165 | 52.6495 | 26.5914 | 16.6887 | 11.1714 | 0.8766 | 18777 | 21250 | 0.4582 | 0.2674 | 0.4426 | 0.4421 | 0.0495 | 19.8138 | 14.1225 | 0.451 | |
|
| 0.4498 | 16.0 | 2327 | 1.3621 | 10008 | 4477 | 2456 | 1381 | 19399 | 17195 | 14991 | 12787 | 51.5903 | 26.0366 | 16.3832 | 10.8 | 0.909 | 19399 | 21250 | 0.4569 | 0.2679 | 0.4415 | 0.4412 | 0.0476 | 20.0703 | 14.3725 | 0.4491 | |
|
| 0.4216 | 16.99 | 2472 | 1.3967 | 10016 | 4483 | 2455 | 1385 | 19125 | 16921 | 14717 | 12513 | 52.3712 | 26.4937 | 16.6814 | 11.0685 | 0.8948 | 19125 | 21250 | 0.4615 | 0.2705 | 0.4457 | 0.4451 | 0.0481 | 20.1319 | 14.3008 | 0.4531 | |
|
| 0.3829 | 18.0 | 2618 | 1.4460 | 9976 | 4407 | 2412 | 1374 | 19464 | 17260 | 15056 | 12852 | 51.2536 | 25.533 | 16.0202 | 10.6909 | 0.9123 | 19464 | 21250 | 0.4556 | 0.2627 | 0.4387 | 0.4385 | 0.0476 | 19.8508 | 14.7046 | 0.4479 | |
|
| 0.3551 | 19.0 | 2764 | 1.4725 | 10010 | 4451 | 2438 | 1385 | 19131 | 16927 | 14723 | 12519 | 52.3235 | 26.2953 | 16.5591 | 11.0632 | 0.8952 | 19131 | 21250 | 0.4606 | 0.2672 | 0.4438 | 0.4434 | 0.0463 | 20.0572 | 14.3807 | 0.4523 | |
|
| 0.3301 | 19.93 | 2900 | 1.5030 | 9858 | 4378 | 2406 | 1368 | 18872 | 16668 | 14464 | 12260 | 52.2361 | 26.2659 | 16.6344 | 11.1582 | 0.8816 | 18872 | 21250 | 0.4569 | 0.2644 | 0.4412 | 0.4405 | 0.0495 | 19.8047 | 14.2795 | 0.4483 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.32.1 |
|
- Pytorch 2.1.0 |
|
- Datasets 2.12.0 |
|
- Tokenizers 0.13.3 |
|
|