henryu-lin/t5-3b-samsum-deepspeed

`t5-3b-samsum-deepspeed`

This model was trained using Microsoft's AzureML and DeepSpeed's ZeRO 2 optimization. It was fine-tuned on the SAMSum corpus from t5-3b checkpoint.

More information on the fine-tuning process (includes samples and benchmarks):
(currently still WIP, updates coming soon: 7/6/21~7/9/21)

Resource Usage

These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.

key	value
AzureML SKU	ND40rs_v2 (8 X V100 32GB)
Region	US West 2
Run Duration	43m 51.05s
Compute Cost (LowPriority/Dedicated)	$3.22/$16.10 (USD)
Average CPU Utilization	46.0%
Average GPU Utilization	56.9%
GPU Memory Usage (Avg/Peak)	26.77/30.49 (GB)
Total GPU Energy Usage	2448.69 (kJ)

*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/
*Peak memory usage is calculated from average peak across all utilized GPUs.

Carbon Emissions

These results are obtained using codecarbon. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).
CodeCarbon: https://github.com/mlco2/codecarbon

key	value
timestamp	2021-07-06T21:57:39
duration	1841.4621863365173
emissions	0.17802492531467784
energy_consumed	0.5982020339874927
country_name	USA
region	Washington
cloud_provider	azure
cloud_region	westus2

Hyperparameters

fp16: True
per device batch size: 2
effective batch size: 16
epoch: 3.0
learning rate: 3e-5
weight decay: 0.0
seed: 1

*Same per device batch size for evaluations

DeepSpeed

Optimizer = AdamW, Scheduler = WarmupDecayLR, Offload = none

  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 1000000000,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 1000000000,
    "contiguous_gradients": true
  }

Usage

from transformers import pipeline
summarizer = pipeline("summarization", model="henryu-lin/t5-3b-samsum-deepspeed")

conversation = '''Henry: Hey, is Nate coming over to watch the movie tonight?
    Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
    Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
    Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too.
    Henry: Nice, I'm really looking forward to seeing them again.
'''
summarizer(conversation)

Results

ROUGE	Score
eval_rouge1	54.7875
eval_rouge2	30.565
eval_rougeL	45.7625
eval_rougeLsum	50.3915
predict_rouge1	53.6628
predict_rouge2	29.0196
predict_rougeL	45.1257
predict_rougeLsum	49.171

Metric	Value
eval_gen_len	25.3399
predict_gen_len	24.9133
train_loss	1.1206104169494209
eval_loss	1.0732421875
predict_loss	1.087890625
train_runtime	1841.3751
train_samples	14732
train_samples_per_second	24.002
train_steps_per_second	1.501
eval_runtime	163.8357
eval_samples	818
eval_samples_per_second	4.993
eval_steps_per_second	0.317
predict_runtime	168.8245
predict_samples	819
predict_samples_per_second	4.851
predict_steps_per_second	0.308
total_steps	2763
total_flos	1.84452086400811e+17

henryu-lin
/

t5-3b-samsum-deepspeed