metadata

language:
  - en
license: apache-2.0
base_model: pszemraj/tFINE-base-300m
tags:
  - generated_from_trainer
datasets:
  - samsum
metrics:
  - rouge
model-index:
  - name: tFINE-base-300m-samsum
    results:
      - task:
          name: Summarization
          type: summarization
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: None
          args: samsum
        metrics:
          - name: Rouge1
            type: rouge
            value: 42.3629
library_name: transformers
pipeline_tag: summarization

tFINE-base-300m-samsum

An example fine-tune of pszemraj/tFINE-base-300m for summarization using the samsum dataset. It achieves the following results on the evaluation set:

Loss: 1.9820
Rouge1: 42.3629
Rouge2: 18.4285
Rougel: 34.6339
Rougelsum: 38.7792
Gen Len: 27.8033

The base model was pre-trained with CTX 1024 and fine-tuned on samsum with 1024 CTX inputs.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 16
seed: 17868
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 4.0

Training results

keep epoch 3 checkpt as final

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.9528	0.9989	115	1.9189	40.093	18.2018	33.9749	36.9071	29.3333
1.5346	1.9978	230	1.8827	41.4676	18.3467	34.1909	38.2131	27.6633
1.1696	2.9967	345	1.9820	42.3629	18.4285	34.6339	38.7792	27.8033
0.9359	3.9957	460	2.1588	41.2237	17.8161	33.7101	37.9569	30.18