Edit model card

Hyperparameters

learning_rate=2e-5
per_device_train_batch_size=14
per_device_eval_batch_size=14
weight_decay=0.01
save_total_limit=3
num_train_epochs=3
predict_with_generate=True
fp16=True

Training Output

global_step=7710,
training_loss=2.8554159399445727,
metrics={'train_runtime': 21924.7566,
'train_samples_per_second': 4.923,
'train_steps_per_second': 0.352,
'total_flos': 2.3807388210639667e+17,
'train_loss': 2.8554159399445727,
'epoch': 3.0}

Training Results

Epoch Training Loss Validation Loss Rouge1 Rouge2 Rougel Rougelsum Bleu Gen Len
1 2.981200 2.831641 0.414500 0.147000 0.230700 0.230600 0.512800 140.734900
2 2.800900 2.789402 0.417300 0.148400 0.231800 0.231700 0.516000 141.158200
3 2.680300 2.780862 0.418300 0.148400 0.232200 0.232100 0.516800 140.872300
Downloads last month
6
Safetensors
Model size
391M params
Tensor type
F32
·

Dataset used to train usakha/Prophetnet_multiNews_model