mt5_baseline
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.0995
- Rouge1: 17.5672
- Rouge2: 5.8375
- Rougel: 17.4382
- Rougelsum: 17.4005
- Gen Len: 20.9710
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- total_eval_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 10.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
4.3576 | 1.0 | 1221 | 3.4988 | 12.4331 | 4.4297 | 12.2911 | 12.2479 | 18.4427 |
3.8587 | 2.0 | 2442 | 3.3121 | 14.8826 | 5.0815 | 14.7234 | 14.6877 | 19.6117 |
3.5957 | 3.0 | 3663 | 3.2360 | 15.0309 | 5.4261 | 14.9197 | 14.9144 | 20.4196 |
3.4707 | 4.0 | 4884 | 3.1737 | 16.2314 | 5.5413 | 16.008 | 16.0567 | 20.0129 |
3.363 | 5.0 | 6105 | 3.1553 | 16.3779 | 5.6995 | 16.2084 | 16.2036 | 20.9977 |
3.2796 | 6.0 | 7326 | 3.1189 | 17.0323 | 5.9413 | 16.9583 | 16.9213 | 20.2460 |
3.227 | 7.0 | 8547 | 3.1143 | 17.4254 | 6.0767 | 17.3036 | 17.3029 | 20.8706 |
3.1937 | 8.0 | 9768 | 3.1013 | 17.523 | 5.8829 | 17.398 | 17.3977 | 20.8485 |
3.1687 | 9.0 | 10989 | 3.0995 | 17.5672 | 5.8375 | 17.4382 | 17.4005 | 20.9710 |
3.177 | 10.0 | 12210 | 3.1020 | 17.6888 | 5.8978 | 17.5756 | 17.5283 | 20.9111 |
Framework versions
- Transformers 4.18.0.dev0
- Pytorch 2.0.0
- Datasets 2.14.5
- Tokenizers 0.12.1
- Downloads last month
- 8