Edit model card

german-jeopardy-mt5-base

This model is a fine-tuned version of google/mt5-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

  • Loss: 1.66
  • Brevity Penalty: 0.9025
  • System Length: 18860
  • Reference Length: 20793
  • ROUGE-1: 40.62
  • ROUGE-2: 21.49
  • ROUGE-L: 39.14
  • ROUGE-Lsum: 39.13
  • Exact Match: 2.72
  • BLEU: 14.56
  • F1: 39.53

Model description

See google/mt5-base for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 7
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Counts 1 Counts 2 Counts 3 Counts 4 Totals 1 Totals 2 Totals 3 Totals 4 Precisions 1 Precisions 2 Precisions 3 Precisions 4 Brevity Penalty System Length Reference Length ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum Exact Match BLEU Mean Generated Length F1
5.5131 1.0 145 1.8698 6032 1668 626 216 16023 13819 11615 9411 37.6459 12.0703 5.3896 2.2952 0.7216 16023 21250 0.2485 0.1011 0.2368 0.2366 0.0018 6.2485 12.6166 0.2406
2.3946 2.0 291 1.5888 7325 2554 1178 558 16853 14649 12445 10241 43.4641 17.4346 9.4656 5.4487 0.7704 16853 21250 0.3226 0.1585 0.31 0.31 0.0145 10.8315 12.2582 0.3148
2.0101 3.0 436 1.4997 7623 2764 1304 629 17042 14838 12634 10430 44.7307 18.6278 10.3214 6.0307 0.7812 17042 21250 0.3403 0.1723 0.3263 0.3263 0.0154 11.7891 12.6783 0.3315
1.8073 4.0 582 1.4610 7728 2916 1415 707 16654 14450 12246 10042 46.4033 20.1799 11.5548 7.0404 0.7588 16654 21250 0.3461 0.1818 0.3324 0.3326 0.0168 12.6068 12.2963 0.3387
1.6851 4.99 727 1.4357 7964 3059 1483 727 17381 15177 12973 10769 45.8201 20.1555 11.4314 6.7509 0.8004 17381 21250 0.3558 0.1888 0.3415 0.3414 0.0159 13.0784 12.7436 0.3483
1.5642 6.0 873 1.4003 8299 3224 1592 788 17351 15147 12943 10739 47.8301 21.2847 12.3001 7.3377 0.7987 17351 21250 0.3814 0.2025 0.3684 0.3685 0.0204 13.9065 12.9569 0.3736
1.4756 6.99 1018 1.3779 8640 3430 1712 879 17669 15465 13261 11057 48.8992 22.1791 12.91 7.9497 0.8165 17669 21250 0.3971 0.2133 0.3828 0.3826 0.025 14.9146 13.1084 0.3892
1.3792 8.0 1164 1.3624 8732 3417 1712 871 17996 15792 13588 11384 48.5219 21.6375 12.5994 7.6511 0.8346 17996 21250 0.4003 0.2131 0.3852 0.3849 0.0245 14.8859 13.3748 0.3917
1.3133 9.0 1310 1.3630 8804 3500 1754 920 17661 15457 13253 11049 49.85 22.6435 13.2347 8.3265 0.8161 17661 21250 0.4078 0.219 0.3932 0.3935 0.025 15.3264 13.2019 0.4
1.261 10.0 1455 1.3685 8910 3602 1849 1000 17709 15505 13301 11097 50.3134 23.2312 13.9012 9.0114 0.8188 17709 21250 0.4135 0.223 0.3991 0.3992 0.0295 16.0163 13.1892 0.4055
1.1897 11.0 1601 1.3639 9096 3690 1902 1012 18261 16057 13853 11649 49.8111 22.9806 13.7299 8.6874 0.849 18261 21250 0.4201 0.2289 0.4059 0.4057 0.0281 16.3202 13.5077 0.4121
1.1453 11.99 1746 1.3610 9106 3735 1932 1023 18329 16125 13921 11717 49.6808 23.1628 13.8783 8.7309 0.8527 18329 21250 0.4173 0.2303 0.4026 0.4025 0.0281 16.4772 13.8013 0.4099
1.0858 13.0 1892 1.3716 9245 3778 1955 1049 18556 16352 14148 11944 49.8222 23.1042 13.8182 8.7827 0.8649 18556 21250 0.4244 0.2327 0.409 0.409 0.0322 16.7204 13.8144 0.417
1.0472 13.99 2037 1.3770 9166 3756 1946 1054 18315 16111 13907 11703 50.0464 23.3133 13.993 9.0062 0.8519 18315 21250 0.4216 0.2311 0.4068 0.4067 0.0309 16.6825 13.8099 0.4143
0.9953 15.0 2183 1.3881 9342 3926 2046 1108 18132 15928 13724 11520 51.5222 24.6484 14.9082 9.6181 0.842 18132 21250 0.4328 0.2418 0.4171 0.4171 0.0327 17.3937 13.5023 0.4258
0.9509 16.0 2329 1.4016 9330 3894 2024 1084 18672 16468 14264 12060 49.9679 23.6459 14.1896 8.9884 0.871 18672 21250 0.4269 0.237 0.4123 0.4122 0.0313 17.1618 13.956 0.4198
0.9183 17.0 2474 1.4152 9303 3824 1979 1084 18476 16272 14068 11864 50.3518 23.5005 14.0674 9.1369 0.8606 18476 21250 0.4269 0.2345 0.4121 0.4122 0.0327 16.995 13.7854 0.4199
0.8696 18.0 2620 1.4404 9184 3798 1993 1085 18379 16175 13971 11767 49.9701 23.4807 14.2653 9.2207 0.8554 18379 21250 0.4218 0.2333 0.4076 0.4074 0.034 16.9541 13.726 0.4148
0.8389 19.0 2765 1.4360 9476 4000 2092 1139 19003 16799 14595 12391 49.8658 23.8109 14.3337 9.1922 0.8885 19003 21250 0.4307 0.2406 0.4161 0.416 0.0299 17.67 14.2064 0.4239
0.7993 19.92 2900 1.4545 9464 3970 2078 1126 18741 16537 14333 12129 50.4989 24.0068 14.498 9.2835 0.8747 18741 21250 0.4349 0.2424 0.4194 0.4192 0.0327 17.5799 13.9959 0.4269

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.1.0
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
1
Safetensors
Model size
582M params
Tensor type
F32
·

Dataset used to train GiantTreeG/german-jeopardy-mt5-base

Evaluation results