german-jeopardy-longt5-base

This model is a fine-tuned version of google/long-t5-tglobal-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8533
  • Brevity Penalty: 0.8910
  • System Length: 18642
  • Reference Length: 20793
  • ROUGE-1: 35.31
  • ROUGE-2: 16.35
  • ROUGE-L: 33.91
  • ROUGE-Lsum: 33.96
  • Exact Match: 1.36
  • BLEU: 10.80
  • F1: 34.41

Model description

See google/long-t5-tglobal-base for more information about the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 7
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • num_epochs: 20

Training results

Training Loss Epoch Step BLEU Brevity Penalty Counts 1 Counts 2 Counts 3 Counts 4 Exact Match F1 Gen Len Validation Loss Precisions 1 Precisions 2 Precisions 3 Precisions 4 Reference Length ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum System Length Totals 1 Totals 2 Totals 3 Totals 4
3.1671 1.0 145 5.9441 0.7156 6177 1669 604 179 0.0023 0.2528 12.0218 2.1902 38.7954 12.1665 5.2458 1.9227 21250 0.2595 0.1035 0.2491 0.2492 15922 15922 13718 11514 9310
2.5597 2.0 291 7.7787 0.7556 6785 2044 804 293 0.0064 0.2864 12.6084 2.0164 40.876 14.1994 6.595 2.9338 21250 0.2931 0.1291 0.2817 0.2818 16599 16599 14395 12191 9987
2.3464 2.99 436 9.2407 0.7935 7251 2326 969 400 0.0073 0.3114 13.2296 1.9138 42.0129 15.45 7.5403 3.7569 21250 0.3162 0.1456 0.3031 0.3031 17259 17259 15055 12851 10647
2.1679 4.0 582 9.6363 0.7795 7382 2393 1006 434 0.0109 0.3226 13.1207 1.8524 43.3903 16.1591 7.981 4.1727 21250 0.3272 0.1504 0.3147 0.3149 17013 17013 14809 12605 10401
2.0454 5.0 728 10.3812 0.7665 7581 2555 1111 482 0.0132 0.3357 12.9782 1.7997 45.1599 17.5204 8.9749 4.7371 21250 0.3401 0.1606 0.3278 0.3279 16787 16787 14583 12379 10175
1.9502 5.99 873 10.7668 0.7992 7759 2618 1162 511 0.0127 0.3406 13.4841 1.7696 44.6973 17.2748 8.9723 4.7548 21250 0.3452 0.1631 0.3321 0.3319 17359 17359 15155 12951 10747
1.8414 7.0 1019 11.3408 0.7721 7791 2693 1236 570 0.015 0.347 13.0563 1.7472 46.147 18.3459 9.9078 5.5496 21250 0.3513 0.1679 0.3391 0.3391 16883 16883 14679 12475 10271
1.7614 8.0 1165 11.8447 0.8198 8024 2799 1296 610 0.0145 0.352 13.515 1.7203 45.2643 18.0313 9.7305 5.4881 21250 0.3565 0.1711 0.3422 0.3423 17727 17727 15523 13319 11115
1.6997 9.0 1310 11.9689 0.8027 8046 2835 1314 615 0.0168 0.3568 13.4306 1.7167 46.183 18.6293 10.0968 5.6892 21250 0.3613 0.1746 0.3466 0.3466 17422 17422 15218 13014 10810
1.6159 10.0 1456 12.5678 0.8182 8087 2928 1395 681 0.0181 0.3564 13.5268 1.6892 45.6944 18.8976 10.4966 6.1429 21250 0.3612 0.1795 0.3485 0.3482 17698 17698 15494 13290 11086
1.5681 10.99 1601 12.497 0.813 8154 2933 1383 664 0.0168 0.3605 13.6044 1.6923 46.3164 19.0442 10.4797 6.0402 21250 0.3654 0.1789 0.3506 0.3505 17605 17605 15401 13197 10993
1.4987 12.0 1747 12.8959 0.8169 8295 3011 1432 697 0.0181 0.3675 13.6134 1.6825 46.928 19.461 10.7929 6.2997 21250 0.3734 0.1846 0.3576 0.3577 17676 17676 15472 13268 11064
1.4461 13.0 1893 12.8688 0.8139 8246 3005 1424 700 0.0191 0.3658 13.5812 1.6784 46.7964 19.4915 10.7773 6.3584 21250 0.3725 0.1857 0.358 0.3576 17621 17621 15417 13213 11009
1.4002 13.99 2038 13.4526 0.8329 8457 3130 1504 745 0.02 0.3727 13.9179 1.6725 47.0749 19.8591 11.0939 6.5621 21250 0.3797 0.1915 0.3637 0.3634 17965 17965 15761 13557 11353
1.3391 15.0 2184 13.211 0.8283 8443 3091 1468 719 0.0204 0.3737 13.9133 1.6783 47.2177 19.7168 10.8959 6.3803 21250 0.3804 0.1901 0.3634 0.363 17881 17881 15677 13473 11269
1.2921 16.0 2330 13.4907 0.8373 8457 3147 1511 747 0.0195 0.3716 13.9882 1.6738 46.8662 19.8662 11.0801 6.5337 21250 0.3782 0.1902 0.3624 0.3624 18045 18045 15841 13637 11433
1.2572 17.0 2475 13.8581 0.8267 8473 3219 1561 783 0.02 0.3753 13.7618 1.6770 47.4598 20.57 11.6103 6.9656 21250 0.3821 0.1948 0.3669 0.3665 17853 17853 15649 13445 11241
1.199 18.0 2621 13.7496 0.8326 8484 3190 1551 771 0.0186 0.3745 13.8798 1.6934 47.2409 20.2475 11.4456 6.7947 21250 0.3812 0.1922 0.3657 0.3658 17959 17959 15755 13551 11347
1.1668 18.99 2766 13.7379 0.8395 8504 3179 1541 776 0.0204 0.376 13.9256 1.6926 47.0198 20.0164 11.2663 6.7631 21250 0.3828 0.1939 0.3665 0.3665 18086 18086 15882 13678 11474
1.1164 19.91 2900 14.1906 0.8529 8625 3250 1609 820 0.0204 0.3803 14.069 1.7026 47.0463 20.15 11.5548 6.996 21250 0.3874 0.1964 0.3716 0.3715 18333 18333 16129 13925 11721

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.1.0
  • Datasets 2.12.0
  • Tokenizers 0.14.1
Downloads last month
7
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train GiantTreeG/german-jeopardy-longt5-base

Evaluation results