Edit model card

model_v1e_5_8_8_4

This model is a fine-tuned version of facebook/bart-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5522
  • Sacrebleu: 66.9834

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Sacrebleu
No log 1.0 218 0.5616 66.0295
No log 2.0 437 0.5691 66.6355
No log 3.0 656 0.5544 66.8901
No log 4.0 875 0.5522 66.9834
No log 5.0 1093 0.5686 67.0746
No log 6.0 1312 0.5995 67.1015
No log 7.0 1531 0.5663 67.1106
No log 8.0 1750 0.5860 67.0824
No log 9.0 1968 0.6075 67.1805
No log 9.97 2180 0.6105 67.1350

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
0
Safetensors
Model size
406M params
Tensor type
F32
·

Finetuned from