DIAL-T0 / README.md
prakharz's picture
Upload with huggingface_hub
b9753fe
|
raw
history blame
4.26 kB
metadata
license: apache-2.0
tags:
  - generated_from_trainer
model-index:
  - name: t0-all_tasksv2-m1-t1
    results: []

t0-all_tasksv2-m1-t1

This model is a fine-tuned version of bigscience/T0_3B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1591
  • Train Runtime: 31498.6176
  • Train Samples Per Second: 15.232
  • Train Steps Per Second: 0.212
  • Train Loss: 1.4163
  • Train Samples: 239899
  • Gen Len: 9.847

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 72
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Accuracy F1 Recall Precision Bleu 1 Bleu 2 Bleu 3 Bleu 4 Rouge L Gen Len
1.5957 0.15 500 1.3177 51.7673 5.9934 51.3052 51.4889 58.2201 58.2201 58.2201 58.2201 0.5452 0.0004 0.0 0.0 0.5025 6.144
1.5657 0.3 1000 1.2654 56.3471 6.2554 55.9191 56.0106 64.5902 64.5902 64.5902 64.5902 0.5834 0.0005 0.0 0.0 0.5423 6.363
1.4614 0.45 1500 1.2279 60.3041 6.4454 59.881 60.0203 69.9766 69.9766 69.9766 69.9766 0.6223 0.0005 0.0 0.0 0.5799 6.319
1.4733 0.6 2000 1.2001 63.1864 6.4428 62.7935 63.0146 74.192 74.192 74.192 74.192 0.6527 0.0005 0.0001 0.0 0.6102 6.319
1.3982 0.75 2500 1.1888 64.2445 6.6019 63.8196 63.9475 75.0351 75.0351 75.0351 75.0351 0.6606 0.0005 0.0001 0.0 0.6151 6.3657
1.4344 0.9 3000 1.1827 63.9356 6.7482 63.5225 63.72 74.6136 74.6136 74.6136 74.6136 0.6576 0.0005 0.0001 0.0 0.6123 6.3577
1.3281 1.05 3500 1.1725 65.0553 6.6823 64.6434 64.8219 76.2529 76.2529 76.2529 76.2529 0.6679 0.0005 0.0001 0.0 0.6206 6.374
1.3033 1.2 4000 1.1753 64.7545 6.5216 64.3853 64.5344 76.1124 76.1124 76.1124 76.1124 0.6628 0.0005 0.0001 0.0 0.619 6.4473
1.2871 1.35 4500 1.1656 65.6713 6.7135 65.185 65.4454 77.0023 77.0023 77.0023 77.0023 0.6718 0.0005 0.0001 0.0 0.6246 6.472
1.3423 1.5 5000 1.1669 65.8966 6.7928 65.5016 65.6741 77.377 77.377 77.377 77.377 0.6772 0.0005 0.0001 0.0 0.6288 6.36
1.333 1.65 5500 1.1627 65.9726 6.7915 65.5878 65.7582 77.4239 77.4239 77.4239 77.4239 0.6742 0.0005 0.0001 0.0 0.6273 6.4767
1.2749 1.8 6000 1.1591 66.5212 6.9115 66.0695 66.3204 77.9859 77.9859 77.9859 77.9859 0.681 0.0006 0.0001 0.0 0.6324 6.4403
1.2891 1.95 6500 1.1571 66.2478 6.8368 65.8198 66.0423 77.5644 77.5644 77.5644 77.5644 0.6778 0.0005 0.0001 0.0 0.6298 6.4417

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.11.0
  • Datasets 2.3.2
  • Tokenizers 0.12.1