Edit model card

switch-base-16-xsum-ba16-lr1e-04-top-4-choose-1

This model is a fine-tuned version of ckpt/switch-base-16-xsum-ba16-lr1e-04-top-1 on the xsum None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8787
  • Rouge1: 41.7006
  • Rouge2: 18.5659
  • Rougel: 33.8505
  • Rougelsum: 33.8509
  • Gen Len: 27.3802

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.82 0.0392 500 1.8345 41.7433 18.5369 33.7425 33.6805 26.684
1.7246 0.0784 1000 1.8309 42.0987 18.6042 33.9385 33.8668 27.522
1.676 0.1176 1500 1.8261 41.8464 18.4644 33.7353 33.6593 27.432
1.7185 0.1568 2000 1.8223 41.8182 18.7745 33.7279 33.669 26.978
1.7767 0.1960 2500 1.8303 41.9122 18.6646 34.3573 34.262 26.442
1.6416 0.2353 3000 1.8244 42.2415 18.9175 34.1414 34.026 26.47
1.7705 0.2745 3500 1.8318 41.8113 18.6874 34.052 34.0217 26.824
1.8043 0.3137 4000 1.8327 41.9095 18.343 33.7389 33.635 27.018
1.7164 0.3529 4500 1.8373 42.2073 18.938 34.4546 34.4091 26.538
1.7164 0.3921 5000 1.8524 42.2046 19.0726 34.219 34.0978 26.24
1.6897 0.4313 5500 1.8453 42.3857 19.0415 34.2471 34.1511 27.594
1.7171 0.4705 6000 1.8659 41.6874 18.5619 33.7171 33.6353 26.166
1.8613 0.5097 6500 1.8533 42.2562 19.1597 33.9973 33.9416 27.216
1.7706 0.5489 7000 1.8701 41.6834 18.1592 33.1219 33.0595 27.194
1.7529 0.5881 7500 1.8466 42.0631 19.1716 34.4291 34.3592 25.944
1.8003 0.6274 8000 1.8578 41.9957 18.7281 34.1092 34.0289 26.854
1.7624 0.6666 8500 1.8595 41.4516 18.2924 33.8007 33.7173 26.754
1.8178 0.7058 9000 1.8566 41.5687 18.4522 33.9984 33.8803 27.554
1.8007 0.7450 9500 1.8582 41.8678 18.5614 33.7731 33.7031 27.19
1.7711 0.7842 10000 1.8518 41.9306 18.821 34.1129 34.0778 27.374
1.7417 0.8234 10500 1.8528 41.7366 18.4769 33.7507 33.6952 27.47
1.7875 0.8626 11000 1.8449 41.8801 18.8616 33.9489 33.837 26.49
1.7787 0.9018 11500 1.8541 41.3488 18.3143 33.523 33.4378 26.69
1.8026 0.9410 12000 1.8450 42.2795 18.4335 33.5961 33.5235 27.52
1.7524 0.9802 12500 1.8515 42.0587 18.956 34.3071 34.2447 26.438
1.6827 1.0194 13000 1.8579 42.0162 18.7528 34.0648 34.0173 27.164
1.6555 1.0587 13500 1.8637 42.1583 19.1618 34.1505 34.0614 27.606
1.6297 1.0979 14000 1.8512 42.1112 19.1284 34.3205 34.2765 26.77
1.7276 1.1371 14500 1.8612 41.8449 18.6131 34.0457 33.9888 27.016
1.7225 1.1763 15000 1.8520 41.6258 18.6595 33.3636 33.3067 27.694
1.7439 1.2155 15500 1.8546 41.9277 19.0431 34.2722 34.1557 27.328
1.6621 1.2547 16000 1.8727 42.0996 18.7612 34.0036 33.9359 26.746
1.7341 1.2939 16500 1.8627 42.4602 19.1932 34.4751 34.3949 26.43
1.8223 1.3331 17000 1.8509 42.4514 19.101 34.3455 34.2848 28.548
1.7453 1.3723 17500 1.8611 42.5151 19.2789 34.7592 34.6541 26.67
1.6724 1.4115 18000 1.8675 42.1884 18.8199 34.484 34.4043 27.712
1.6722 1.4508 18500 1.8510 42.1319 19.1719 34.1064 34.0313 27.498
1.774 1.4900 19000 1.8549 41.759 18.5662 33.8042 33.7887 27.83
1.8215 1.5292 19500 1.8384 42.6817 19.2857 34.3946 34.3357 28.848
1.6981 1.5684 20000 1.8451 42.3826 19.0729 34.0673 34.0398 27.424
1.7384 1.6076 20500 1.8430 42.2756 19.0932 34.1714 34.0994 27.17
1.6954 1.6468 21000 1.8441 41.6839 18.5455 33.6478 33.6412 27.11
1.6953 1.6860 21500 1.8437 42.3576 19.2167 34.3028 34.2274 26.832
1.738 1.7252 22000 1.8448 42.6272 19.4012 34.5034 34.4339 27.842
1.7296 1.7644 22500 1.8391 42.094 19.1504 34.3706 34.293 27.316
1.7322 1.8036 23000 1.8428 41.7726 18.4484 33.5719 33.5336 26.942
1.7898 1.8428 23500 1.8517 41.9235 18.946 34.1103 34.0349 27.128
1.7466 1.8821 24000 1.8461 42.1971 19.0842 34.0274 33.928 27.046
1.7316 1.9213 24500 1.8407 42.5489 19.1149 34.0085 33.9547 27.342
1.7942 1.9605 25000 1.8342 41.5821 18.5414 33.8051 33.7498 27.452
1.7265 1.9997 25500 1.8281 42.4841 19.3053 34.6234 34.5556 27.04
1.5864 2.0389 26000 1.8580 41.8429 18.4342 33.55 33.5009 27.658
1.6329 2.0781 26500 1.8584 41.3693 18.5609 33.2237 33.1768 27.654
1.6196 2.1173 27000 1.8475 41.6415 18.5115 34.0055 33.9529 27.148
1.5914 2.1565 27500 1.8508 41.7924 18.7119 33.9297 33.8802 27.242
1.5994 2.1957 28000 1.8596 41.9335 18.9003 34.0139 33.9547 26.492
1.6694 2.2349 28500 1.8460 41.9046 18.7057 34.0046 33.9651 27.562
1.5601 2.2742 29000 1.8478 42.4264 18.9568 34.0898 34.0378 27.054
1.6992 2.3134 29500 1.8454 42.1638 19.069 34.2749 34.1919 27.218
1.6055 2.3526 30000 1.8398 42.4919 19.1231 34.0751 34.0269 28.09
1.6155 2.3918 30500 1.8569 42.192 18.9854 34.2546 34.1864 27.106
1.648 2.4310 31000 1.8513 42.3083 18.9035 34.135 34.0683 27.408
1.6233 2.4702 31500 1.8614 42.3134 19.1576 34.3097 34.1989 27.34
1.6563 2.5094 32000 1.8590 42.3082 18.6551 34.4782 34.4115 26.862
1.6425 2.5486 32500 1.8570 41.7806 18.3501 33.7608 33.7029 28.21
1.6308 2.5878 33000 1.8645 42.1087 18.7599 34.0132 33.9411 27.044
1.674 2.6270 33500 1.8499 41.9241 18.8206 33.9823 33.9304 27.204
1.6666 2.6662 34000 1.8484 42.2197 18.9401 34.177 34.1715 27.01
1.6879 2.7055 34500 1.8517 42.1009 19.2465 34.2592 34.2487 27.414
1.6554 2.7447 35000 1.8464 42.1655 18.9314 33.5946 33.4949 27.298
1.6575 2.7839 35500 1.8578 42.5483 19.3789 34.4704 34.3977 26.574
1.6167 2.8231 36000 1.8487 42.1633 19.0725 34.2673 34.161 27.398
1.6854 2.8623 36500 1.8365 41.6119 18.4366 33.5347 33.5198 26.808
1.617 2.9015 37000 1.8376 42.3427 18.9181 34.064 33.9604 27.714
1.7227 2.9407 37500 1.8425 42.2708 18.7939 33.8753 33.8002 28.762
1.653 2.9799 38000 1.8414 42.8082 19.1209 34.4406 34.36 27.004
1.5761 3.0191 38500 1.8588 42.8093 19.384 34.2343 34.1453 28.6
1.5317 3.0583 39000 1.8641 42.0878 18.8182 33.8261 33.7462 26.564
1.516 3.0976 39500 1.8723 42.0749 18.8889 34.0372 33.9587 27.464
1.5598 3.1368 40000 1.8668 42.6486 19.4263 34.5354 34.4585 27.172
1.5217 3.1760 40500 1.8703 42.2621 19.0664 34.1319 34.1065 27.778
1.534 3.2152 41000 1.8584 43.1824 20.0366 35.0429 34.9641 27.236
1.5295 3.2544 41500 1.8700 42.4558 18.9938 34.2622 34.2234 26.994
1.5092 3.2936 42000 1.8605 42.3967 19.2333 34.5747 34.5123 27.602
1.5255 3.3328 42500 1.8501 42.796 19.4993 34.6254 34.6039 27.632
1.6231 3.3720 43000 1.8599 42.4521 18.9434 34.2617 34.2183 27.7
1.574 3.4112 43500 1.8637 42.6093 19.4095 34.4866 34.4361 27.032
1.5409 3.4504 44000 1.8746 42.2728 19.3077 34.49 34.4106 27.058
1.5353 3.4896 44500 1.8431 42.5131 19.2332 34.3244 34.2416 27.996
1.5552 3.5289 45000 1.8612 42.3807 19.4296 34.223 34.1504 27.824
1.5966 3.5681 45500 1.8614 42.6764 19.6147 34.3084 34.2632 28.008
1.573 3.6073 46000 1.8581 42.2219 19.0877 34.3519 34.2604 27.048
1.551 3.6465 46500 1.8619 42.8421 19.7374 34.7647 34.7236 27.59
1.5675 3.6857 47000 1.8596 42.2733 18.9215 34.009 33.9305 28.162
1.6038 3.7249 47500 1.8443 42.393 19.6113 34.5892 34.5415 27.336
1.5423 3.7641 48000 1.8635 42.5436 19.1268 34.0135 33.9472 27.694
1.5919 3.8033 48500 1.8512 42.7832 19.4323 34.3718 34.3138 27.768
1.6734 3.8425 49000 1.8340 42.4655 19.0418 34.1518 34.0744 28.6
1.5669 3.8817 49500 1.8325 42.6709 19.6843 34.631 34.5934 27.218
1.5936 3.9210 50000 1.8419 42.8155 19.6704 34.5919 34.529 27.124
1.6364 3.9602 50500 1.8350 42.8325 19.8657 35.0955 35.0193 26.862
1.573 3.9994 51000 1.8402 42.1256 19.2382 34.2805 34.2008 27.636
1.4155 4.0386 51500 1.8670 42.8 19.5408 34.7289 34.6675 28.644
1.5129 4.0778 52000 1.8700 42.7375 19.1769 34.4609 34.4479 27.688
1.3889 4.1170 52500 1.8780 42.6974 19.4927 34.6657 34.6225 26.986
1.5176 4.1562 53000 1.8822 42.5666 19.4135 34.3193 34.2814 27.448
1.4781 4.1954 53500 1.8586 42.4382 19.5865 34.6524 34.6641 27.504
1.5111 4.2346 54000 1.8636 42.4604 19.2144 34.26 34.2266 27.536
1.4839 4.2738 54500 1.8645 42.4237 19.3525 34.394 34.3567 28.072
1.5513 4.3130 55000 1.8761 42.972 19.5635 34.9258 34.8552 27.478
1.43 4.3523 55500 1.8648 42.8976 20.116 35.3021 35.2536 26.828
1.4921 4.3915 56000 1.8684 42.3922 19.3206 34.6468 34.6143 28.68
1.4981 4.4307 56500 1.8712 42.7565 19.7842 34.723 34.6904 27.466
1.4934 4.4699 57000 1.8807 43.1096 20.221 34.9533 34.8881 27.688
1.5839 4.5091 57500 1.8659 41.8117 18.6804 33.9455 33.8418 27.564
1.5576 4.5483 58000 1.8668 42.8675 19.5292 34.2995 34.249 28.388
1.4599 4.5875 58500 1.8619 42.2905 19.3224 34.575 34.5091 27.492
1.5602 4.6267 59000 1.8629 43.0611 19.6384 34.5437 34.5254 26.876
1.4867 4.6659 59500 1.8608 42.7051 19.4958 34.9163 34.8655 28.002
1.4744 4.7051 60000 1.8660 42.7231 19.363 34.4738 34.4797 27.804
1.5493 4.7444 60500 1.8583 42.3275 19.3555 34.6037 34.5347 27.404
1.5281 4.7836 61000 1.8550 42.3127 18.987 34.2098 34.181 27.62
1.5368 4.8228 61500 1.8562 42.723 19.3495 34.653 34.5672 27.246
1.5668 4.8620 62000 1.8488 41.9763 18.8193 33.6753 33.618 27.632
1.5354 4.9012 62500 1.8494 42.6601 19.2107 34.3055 34.2831 27.256
1.5407 4.9404 63000 1.8469 42.5046 19.0006 34.043 33.9565 27.756
1.5312 4.9796 63500 1.8559 42.7468 19.3673 34.4785 34.4349 28.256

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
13
Safetensors
Model size
1.07B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Dataset used to train taehyunzzz/switch-base-16-xsum-ba16-lr1e-04-top-4-choose-1

Evaluation results