---
license: apache-2.0
base_model: google/flan-t5-small
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: t5-summarization-zero-shot-headers-and-better-prompt-enriched
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# t5-summarization-zero-shot-headers-and-better-prompt-enriched

This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.3132
- Rouge: {'rouge1': 0.426, 'rouge2': 0.195, 'rougeL': 0.2024, 'rougeLsum': 0.2024}
- Bert Score: 0.877
- Bleurt 20: -0.8149
- Gen Len: 13.66

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 7
- eval_batch_size: 7
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 20

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rouge                                                                       | Bert Score | Bleurt 20 | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:---------------------------------------------------------------------------:|:----------:|:---------:|:-------:|
| 2.8785        | 1.0   | 172  | 2.6476          | {'rouge1': 0.462, 'rouge2': 0.1848, 'rougeL': 0.1845, 'rougeLsum': 0.1845}  | 0.8707     | -0.8319   | 15.17   |
| 2.6366        | 2.0   | 344  | 2.4685          | {'rouge1': 0.4501, 'rouge2': 0.1849, 'rougeL': 0.1933, 'rougeLsum': 0.1933} | 0.872      | -0.8531   | 14.545  |
| 2.3822        | 3.0   | 516  | 2.3766          | {'rouge1': 0.4217, 'rouge2': 0.1759, 'rougeL': 0.1867, 'rougeLsum': 0.1867} | 0.8719     | -0.8998   | 13.675  |
| 2.2235        | 4.0   | 688  | 2.3262          | {'rouge1': 0.4396, 'rouge2': 0.1832, 'rougeL': 0.1867, 'rougeLsum': 0.1867} | 0.8715     | -0.8847   | 14.38   |
| 2.0765        | 5.0   | 860  | 2.3122          | {'rouge1': 0.4143, 'rouge2': 0.1769, 'rougeL': 0.1907, 'rougeLsum': 0.1907} | 0.875      | -0.9206   | 13.37   |
| 2.0141        | 6.0   | 1032 | 2.2993          | {'rouge1': 0.4257, 'rouge2': 0.1867, 'rougeL': 0.1943, 'rougeLsum': 0.1943} | 0.8773     | -0.8751   | 13.555  |
| 1.9087        | 7.0   | 1204 | 2.2855          | {'rouge1': 0.4236, 'rouge2': 0.1858, 'rougeL': 0.1895, 'rougeLsum': 0.1895} | 0.8774     | -0.87     | 13.255  |
| 1.868         | 8.0   | 1376 | 2.2795          | {'rouge1': 0.4298, 'rouge2': 0.1896, 'rougeL': 0.1956, 'rougeLsum': 0.1956} | 0.877      | -0.8837   | 13.65   |
| 1.8063        | 9.0   | 1548 | 2.2802          | {'rouge1': 0.4427, 'rouge2': 0.1965, 'rougeL': 0.2011, 'rougeLsum': 0.2011} | 0.8779     | -0.8358   | 13.965  |
| 1.7161        | 10.0  | 1720 | 2.2685          | {'rouge1': 0.4146, 'rouge2': 0.1828, 'rougeL': 0.1918, 'rougeLsum': 0.1918} | 0.8795     | -0.8725   | 13.155  |
| 1.7027        | 11.0  | 1892 | 2.2824          | {'rouge1': 0.423, 'rouge2': 0.1871, 'rougeL': 0.1958, 'rougeLsum': 0.1958}  | 0.8781     | -0.8476   | 13.49   |
| 1.6575        | 12.0  | 2064 | 2.2888          | {'rouge1': 0.4231, 'rouge2': 0.1847, 'rougeL': 0.1939, 'rougeLsum': 0.1939} | 0.878      | -0.8648   | 13.3    |
| 1.6046        | 13.0  | 2236 | 2.2946          | {'rouge1': 0.4387, 'rouge2': 0.1942, 'rougeL': 0.1987, 'rougeLsum': 0.1987} | 0.8771     | -0.8336   | 13.835  |
| 1.5638        | 14.0  | 2408 | 2.2961          | {'rouge1': 0.4225, 'rouge2': 0.1864, 'rougeL': 0.1973, 'rougeLsum': 0.1973} | 0.8774     | -0.8456   | 13.345  |
| 1.6015        | 15.0  | 2580 | 2.2937          | {'rouge1': 0.429, 'rouge2': 0.1947, 'rougeL': 0.2007, 'rougeLsum': 0.2007}  | 0.8777     | -0.8402   | 13.655  |
| 1.5146        | 16.0  | 2752 | 2.3077          | {'rouge1': 0.4208, 'rouge2': 0.1869, 'rougeL': 0.1978, 'rougeLsum': 0.1978} | 0.8751     | -0.8221   | 13.695  |
| 1.5421        | 17.0  | 2924 | 2.3094          | {'rouge1': 0.4263, 'rouge2': 0.1938, 'rougeL': 0.202, 'rougeLsum': 0.202}   | 0.8759     | -0.8207   | 13.67   |
| 1.5328        | 18.0  | 3096 | 2.3114          | {'rouge1': 0.4306, 'rouge2': 0.1927, 'rougeL': 0.2006, 'rougeLsum': 0.2006} | 0.8758     | -0.8284   | 13.755  |
| 1.5181        | 19.0  | 3268 | 2.3128          | {'rouge1': 0.4298, 'rouge2': 0.196, 'rougeL': 0.1997, 'rougeLsum': 0.1997}  | 0.8764     | -0.8211   | 13.77   |
| 1.4926        | 20.0  | 3440 | 2.3132          | {'rouge1': 0.426, 'rouge2': 0.195, 'rougeL': 0.2024, 'rougeLsum': 0.2024}   | 0.877      | -0.8149   | 13.66   |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0