german-jeopardy-mt5-base-256

This model is a fine-tuned version of google/mt5-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

  • Loss: 1.51
  • Brevity Penalty: 0.8658
  • System Length: 18174
  • Reference Length: 20793
  • ROUGE-1: 38.80
  • ROUGE-2: 20.27
  • ROUGE-L: 37.34
  • ROUGE-Lsum: 37.32
  • Exact Match: 2.81
  • BLEU: 13.70
  • F1: 37.79

Model description

See google/mt5-base for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 7
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 256
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Counts 1 Counts 2 Counts 3 Counts 4 Totals 1 Totals 2 Totals 3 Totals 4 Precisions 1 Precisions 2 Precisions 3 Precisions 4 Brevity Penalty System Length Reference Length ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum Exact Match BLEU Mean Generated Length F1
8.9608 0.99 36 2.8883 2306 50 12 2 17876 15672 13468 11264 12.9 0.319 0.0891 0.0178 0.828 17876 21250 0.0081 0.0022 0.0078 0.0078 0.0 0.2352 3.1969 0.0092
3.2364 1.98 72 1.9242 6125 1727 687 277 21152 18948 16744 14540 28.9571 9.1144 4.103 1.9051 0.9954 21152 21250 0.2457 0.1026 0.2345 0.2346 0.0018 6.7083 11.8072 0.2514
2.4963 3.0 109 1.6558 6903 2271 975 409 16537 14333 12129 9925 41.7428 15.8446 8.0386 4.1209 0.752 16537 21250 0.2966 0.1415 0.2854 0.2852 0.01 9.1493 12.176 0.2909
2.2314 3.98 145 1.5771 7160 2440 1098 501 16627 14423 12219 10015 43.0625 16.9174 8.986 5.0025 0.7573 16627 21250 0.314 0.1535 0.3028 0.3028 0.0136 10.187 12.157 0.3069
2.0578 4.97 181 1.5347 7447 2625 1214 566 17305 15101 12897 10693 43.0338 17.383 9.413 5.2932 0.7961 17305 21250 0.3286 0.1628 0.3146 0.3146 0.0163 11.0621 12.5585 0.32
1.8928 5.99 218 1.5128 7396 2659 1257 611 16598 14394 12190 9986 44.5596 18.473 10.3117 6.1186 0.7556 16598 21250 0.3326 0.1684 0.3198 0.3198 0.0177 11.4063 12.1692 0.3234
1.8573 6.98 254 1.4736 7531 2758 1313 641 16728 14524 12320 10116 45.0203 18.9893 10.6575 6.3365 0.7631 16728 21250 0.3349 0.1717 0.3216 0.3216 0.0163 11.8292 12.3035 0.327
1.7361 8.0 291 1.4544 7658 2849 1368 668 16928 14724 12520 10316 45.2387 19.3494 10.9265 6.4754 0.7747 16928 21250 0.3414 0.1762 0.3283 0.3284 0.0181 12.2208 12.4628 0.3334
1.7162 8.99 327 1.4459 7703 2891 1390 694 16795 14591 12387 10183 45.8648 19.8136 11.2214 6.8153 0.767 16795 21250 0.3454 0.1785 0.3325 0.3323 0.0159 12.4536 12.4174 0.3374
1.6589 9.98 363 1.4383 7889 2983 1449 719 17376 15172 12968 10764 45.4017 19.6612 11.1737 6.6797 0.8002 17376 21250 0.3519 0.1816 0.3375 0.3372 0.0172 12.8553 12.7101 0.3435
1.5571 10.99 400 1.4214 7889 2994 1457 736 17185 14981 12777 10573 45.9063 19.9853 11.4033 6.9611 0.7894 17185 21250 0.3529 0.1845 0.3392 0.3393 0.02 12.9671 12.6466 0.3457
1.5502 11.98 436 1.4135 7930 3008 1477 741 16868 14664 12460 10256 47.0121 20.5128 11.8539 7.225 0.7712 16868 21250 0.3619 0.189 0.3492 0.3491 0.0213 13.0741 12.4483 0.3541
1.4564 13.0 473 1.3943 8268 3200 1616 837 17929 15725 13521 11317 46.1152 20.3498 11.9518 7.396 0.8309 17929 21250 0.3729 0.1974 0.3578 0.3576 0.0218 14.1014 13.2441 0.3647
1.4522 13.99 509 1.3953 8047 3130 1564 811 16789 14585 12381 10177 47.9302 21.4604 12.6323 7.9689 0.7667 16789 21250 0.3712 0.197 0.3582 0.3581 0.0227 13.7526 12.515 0.3627
1.407 14.98 545 1.3759 8498 3358 1703 877 17923 15719 13515 11311 47.4139 21.3627 12.6008 7.7535 0.8306 17923 21250 0.3856 0.2063 0.3709 0.3706 0.0213 14.7315 13.2849 0.3772
1.3294 15.99 582 1.3776 8481 3407 1721 883 17451 15247 13043 10839 48.5989 22.3454 13.1948 8.1465 0.8044 17451 21250 0.3907 0.211 0.3766 0.3766 0.024 14.868 12.9142 0.3822
1.3294 16.98 618 1.3803 8633 3464 1767 923 18004 15800 13596 11392 47.9505 21.9241 12.9965 8.1022 0.835 18004 21250 0.3946 0.2133 0.3801 0.3798 0.0263 15.2312 13.3103 0.3868
1.2605 18.0 655 1.3710 8560 3376 1695 880 17830 15626 13422 11218 48.009 21.605 12.6285 7.8445 0.8255 17830 21250 0.3922 0.2092 0.3778 0.3775 0.0231 14.779 13.1665 0.3846
1.2667 18.99 691 1.3694 8664 3455 1733 882 17834 15630 13426 11222 48.5814 22.1049 12.9078 7.8596 0.8257 17834 21250 0.3987 0.2138 0.3853 0.3851 0.0227 15.0008 13.2232 0.3906
1.2074 19.79 720 1.3658 8770 3465 1737 880 18039 15835 13631 11427 48.6169 21.8819 12.743 7.7011 0.8369 18039 21250 0.4025 0.215 0.3883 0.3879 0.0227 15.0442 13.4424 0.3941

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.1.0
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
7
Safetensors
Model size
582M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train GiantTreeG/german-jeopardy-mt5-base-256

Evaluation results