pszemraj's picture
Update README.md
5a95ac7 verified
metadata
language:
  - en
license: apache-2.0
base_model: pszemraj/tFINE-base-300m
tags:
  - generated_from_trainer
datasets:
  - samsum
metrics:
  - rouge
model-index:
  - name: tFINE-base-300m-samsum
    results:
      - task:
          name: Summarization
          type: summarization
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: None
          args: samsum
        metrics:
          - name: Rouge1
            type: rouge
            value: 42.3629
library_name: transformers
pipeline_tag: summarization

tFINE-base-300m-samsum

An example fine-tune of pszemraj/tFINE-base-300m for summarization using the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9820
  • Rouge1: 42.3629
  • Rouge2: 18.4285
  • Rougel: 34.6339
  • Rougelsum: 38.7792
  • Gen Len: 27.8033

The base model was pre-trained with CTX 1024 and fine-tuned on samsum with 1024 CTX inputs.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 17868
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 4.0

Training results

keep epoch 3 checkpt as final

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.9528 0.9989 115 1.9189 40.093 18.2018 33.9749 36.9071 29.3333
1.5346 1.9978 230 1.8827 41.4676 18.3467 34.1909 38.2131 27.6633
1.1696 2.9967 345 1.9820 42.3629 18.4285 34.6339 38.7792 27.8033
0.9359 3.9957 460 2.1588 41.2237 17.8161 33.7101 37.9569 30.18