Word-selector / README.md
zera09's picture
End of training
24bee52 verified
metadata
license: apache-2.0
base_model: google/long-t5-tglobal-base
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: Word-selector
    results: []

Word-selector

This model is a fine-tuned version of google/long-t5-tglobal-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5118
  • Rouge1: 0.3547
  • Rouge2: 0.0761
  • Rougel: 0.2663
  • Rougelsum: 0.2667
  • Gen Len: 25.195

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 12
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 30

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
No log 1.0 400 3.9221 0.2104 0.0295 0.1684 0.1684 34.3531
4.5051 2.0 800 3.7571 0.285 0.0449 0.2195 0.2197 20.2419
3.9507 3.0 1200 3.6847 0.2976 0.0513 0.2309 0.2311 22.7119
3.6575 4.0 1600 3.6350 0.3137 0.0595 0.2411 0.2411 25.9231
3.4177 5.0 2000 3.6229 0.3311 0.0636 0.2527 0.2527 22.3788
3.4177 6.0 2400 3.6223 0.3359 0.0658 0.254 0.2543 21.2994
3.1741 7.0 2800 3.6313 0.3453 0.0674 0.2617 0.2618 21.9181
3.013 8.0 3200 3.6278 0.3453 0.0689 0.2649 0.2651 22.93
2.8253 9.0 3600 3.6755 0.3511 0.0705 0.2658 0.2662 23.1806
2.6705 10.0 4000 3.7081 0.3509 0.0742 0.2663 0.2664 22.5356
2.6705 11.0 4400 3.7424 0.3528 0.0716 0.264 0.2643 23.3775
2.5081 12.0 4800 3.8135 0.3553 0.0753 0.2686 0.2686 22.985
2.3745 13.0 5200 3.8369 0.3548 0.0753 0.2671 0.2675 23.7719
2.2399 14.0 5600 3.8816 0.3591 0.0762 0.2708 0.2709 23.1612
2.1414 15.0 6000 3.9132 0.361 0.0781 0.2719 0.2721 24.4581
2.1414 16.0 6400 3.9946 0.3579 0.077 0.2715 0.2714 23.2131
2.0099 17.0 6800 4.0376 0.3595 0.0766 0.2701 0.2703 23.6681
1.9252 18.0 7200 4.0829 0.3576 0.0774 0.2691 0.2694 23.79
1.8406 19.0 7600 4.1218 0.3613 0.0776 0.2718 0.272 23.9888
1.7602 20.0 8000 4.1754 0.3588 0.0787 0.2702 0.2704 24.5425
1.7602 21.0 8400 4.2440 0.3602 0.0769 0.2716 0.2717 24.9531
1.6725 22.0 8800 4.2860 0.3581 0.0775 0.2688 0.2691 24.6638
1.6036 23.0 9200 4.3163 0.3582 0.0764 0.2697 0.27 24.5994
1.5572 24.0 9600 4.3655 0.3545 0.0749 0.2655 0.2658 25.145
1.5034 25.0 10000 4.3811 0.3583 0.0781 0.2695 0.2698 25.6856
1.5034 26.0 10400 4.4350 0.3593 0.0788 0.2691 0.2692 25.2394
1.4617 27.0 10800 4.4539 0.357 0.078 0.2686 0.269 25.2906
1.4175 28.0 11200 4.4785 0.3549 0.0757 0.2657 0.2661 25.62
1.3971 29.0 11600 4.5061 0.3567 0.0767 0.2661 0.2665 25.1988
1.3828 30.0 12000 4.5118 0.3547 0.0761 0.2663 0.2667 25.195

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.1.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.15.1