Edit model card

mt5_keep_training

This model is a fine-tuned version of kyle0518/mt5_baseline on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3480
  • Rouge-1: {'r': 0.24188216677604132, 'p': 0.3075288971142052, 'f': 0.2611479632252186}
  • Rouge-2: {'r': 0.09316873118754515, 'p': 0.11360644537729306, 'f': 0.09885626415809506}
  • Rouge-l: {'r': 0.21624535609324255, 'p': 0.2753054971328456, 'f': 0.2334192537650015}
  • Gen Len: 20.4044

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 10.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rouge-1 Rouge-2 Rouge-l Gen Len
3.5447 1.0 1221 3.3790 {'r': 0.23784785397988958, 'p': 0.30439920486956745, 'f': 0.2571232281810187} {'r': 0.09058120996245961, 'p': 0.11107673870097613, 'f': 0.09611256335904114} {'r': 0.21365476216010548, 'p': 0.27379373959526926, 'f': 0.23094657086285145} 20.5440
3.5237 2.0 2442 3.3656 {'r': 0.23716744545662055, 'p': 0.30347215524971394, 'f': 0.25644099572235274} {'r': 0.09094131228908721, 'p': 0.11068068383132365, 'f': 0.0962432398231907} {'r': 0.21270409540251395, 'p': 0.27202975968116894, 'f': 0.22974900965917708} 20.4450
3.5214 3.0 3663 3.3586 {'r': 0.23897862979041465, 'p': 0.3042472315693906, 'f': 0.2579145138360098} {'r': 0.09104488924500853, 'p': 0.11099503621779311, 'f': 0.09651312517289976} {'r': 0.21413906977834038, 'p': 0.273060927124794, 'f': 0.2311229256081215} 20.4749
3.5134 4.0 4884 3.3520 {'r': 0.23950214865662794, 'p': 0.3070823485995523, 'f': 0.2594826424731843} {'r': 0.09171226581874202, 'p': 0.11199843376161128, 'f': 0.09742116651193937} {'r': 0.21414716172876794, 'p': 0.2750761594339632, 'f': 0.23204107943337418} 20.3257
3.4993 5.0 6105 3.3530 {'r': 0.2404696251507306, 'p': 0.30653841362934947, 'f': 0.259901001807806} {'r': 0.0924362657515393, 'p': 0.1126726581277754, 'f': 0.09809607014066778} {'r': 0.21554504617194553, 'p': 0.2750045023966863, 'f': 0.23291172821730913} 20.3947
3.4818 6.0 7326 3.3510 {'r': 0.2399176469534912, 'p': 0.30602838752959266, 'f': 0.2592564631122185} {'r': 0.0917330686298591, 'p': 0.11213011475513528, 'f': 0.09732212805120544} {'r': 0.21435755587904412, 'p': 0.27365982325493976, 'f': 0.23156684680927045} 20.3197
3.4701 7.0 8547 3.3478 {'r': 0.2420024353167058, 'p': 0.3088020635950286, 'f': 0.2616097514115912} {'r': 0.09360029653501188, 'p': 0.11432231694204327, 'f': 0.0994129529655908} {'r': 0.21661906498749253, 'p': 0.27656398792461684, 'f': 0.2340430303667163} 20.3832
3.4628 8.0 9768 3.3494 {'r': 0.24157995165740645, 'p': 0.30718548271993334, 'f': 0.2606764786406747} {'r': 0.093124643830029, 'p': 0.11358747674559151, 'f': 0.09873509568033847} {'r': 0.216186503705635, 'p': 0.27521490938646087, 'f': 0.2332181498940174} 20.4224
3.4575 9.0 10989 3.3482 {'r': 0.2416243454789221, 'p': 0.3069717158507894, 'f': 0.26070624414427557} {'r': 0.09302422828604139, 'p': 0.1132862972237393, 'f': 0.09861011965625359} {'r': 0.21609218322588264, 'p': 0.27482726675479946, 'f': 0.23308537171573893} 20.4127
3.483 10.0 12210 3.3480 {'r': 0.24188216677604132, 'p': 0.3075288971142052, 'f': 0.2611479632252186} {'r': 0.09316873118754515, 'p': 0.11360644537729306, 'f': 0.09885626415809506} {'r': 0.21624535609324255, 'p': 0.2753054971328456, 'f': 0.2334192537650015} 20.4044

Framework versions

  • Transformers 4.18.0.dev0
  • Pytorch 2.0.0
  • Datasets 2.14.5
  • Tokenizers 0.12.1
Downloads last month
2