metadata

license: apache-2.0
base_model: google/switch-base-8
tags:
  - generated_from_trainer
datasets:
  - samsum
metrics:
  - rouge
model-index:
  - name: switch-base-8-samsum-top-4-choose-1-deconly
    results:
      - task:
          name: Sequence-to-sequence Language Modeling
          type: text2text-generation
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: validation
          args: samsum
        metrics:
          - name: Rouge1
            type: rouge
            value: 47.2666

switch-base-8-samsum-top-4-choose-1-deconly

This model is a fine-tuned version of google/switch-base-8 on the samsum dataset. It achieves the following results on the evaluation set:

Loss: 1.5869
Rouge1: 47.2666
Rouge2: 24.2196
Rougel: 40.1766
Rougelsum: 43.8418
Gen Len: 16.9352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
5.4611	0.2172	200	3.0917	23.5686	7.6846	20.6877	22.0746	14.7946
2.6551	0.4343	400	2.1027	39.7231	17.2476	33.3172	37.0509	17.1369
2.4452	0.6515	600	1.9255	42.9952	19.6478	35.8054	40.1569	17.3007
2.1259	0.8686	800	1.8270	43.9723	21.3238	37.0066	40.9323	16.1027
2.0957	1.0858	1000	1.7708	45.1103	21.769	37.9229	41.7446	17.2482
2.1168	1.3029	1200	1.7185	45.6806	22.0335	38.2398	42.4051	16.5941
2.1491	1.5201	1400	1.6982	46.0573	22.2803	38.33	42.531	16.9291
1.9829	1.7372	1600	1.6803	45.8845	22.4145	38.795	42.5814	16.4976
1.9741	1.9544	1800	1.6657	45.6645	22.0154	38.2445	42.2358	17.2689
1.8286	2.1716	2000	1.6462	46.7647	23.2912	39.4015	43.3207	16.8704
1.8177	2.3887	2200	1.6486	45.8872	22.8119	38.7398	42.3427	16.0403
1.8606	2.6059	2400	1.6270	45.9799	22.9475	38.9393	42.7565	16.6687
1.8327	2.8230	2600	1.6210	46.2715	23.4171	39.4324	43.0326	16.5452
1.6738	3.0402	2800	1.6242	46.1248	22.7245	38.8572	42.5884	16.8252
1.7515	3.2573	3000	1.6155	46.5372	23.4014	39.54	43.187	16.665
1.7728	3.4745	3200	1.6000	46.6652	23.4739	39.4761	43.2783	16.7873
1.7584	3.6916	3400	1.5922	47.2313	24.0035	39.9195	43.6996	16.7702
1.7082	3.9088	3600	1.5957	46.5132	23.4692	39.4884	43.2236	16.6553
1.5968	4.1260	3800	1.5916	47.2622	23.9444	40.1308	43.7971	16.9083
1.6439	4.3431	4000	1.5880	46.9607	23.7839	39.7431	43.5831	16.9621
1.6684	4.5603	4200	1.5930	47.2611	23.9828	40.0767	43.8297	16.8851
1.7749	4.7774	4400	1.5882	46.9562	23.874	39.8904	43.536	16.9377
1.6401	4.9946	4600	1.5869	47.2666	24.2196	40.1766	43.8418	16.9352

Framework versions

Transformers 4.41.2
Pytorch 2.0.1+cu117
Datasets 2.20.0
Tokenizers 0.19.1