File size: 5,124 Bytes
81ca18a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77653b1
53b2c69
 
 
 
81ca18a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77653b1
53b2c69
 
 
 
81ca18a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143

---
language: en
tags:
- azureml
- t5
- summarization
- deepspeed
license: apache-2.0
datasets:
- samsum
model-index:
- name: t5-large-samsum-deepspeed
  results:
  - task: 
      name: Abstractive Text Summarization
      type: abstractive-text-summarization
    dataset:
      name: "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization" 
      type: samsum
widget:
- text: | 
    Kevin: Hey man, are you excited to watch Finding Nemo tonight?
    Henry: Yea, I can't wait to watch that same movie for the 89th time. Is Nate coming over to watch it with us tonight?
    Kevin: Yep, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
    Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. I didn't get to start on it until an hour ago, and it's due in 30 minutes.
    Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate is bringing his girlfriend and their cat too.
    Henry: Nice, I'm really looking forward to seeing them again.
---

## `t5-large-samsum-deepspeed`
This model was trained using Microsoft's `AzureML` and `DeepSpeed`'s ZeRO 2 optimization. It was fine-tuned on the `SAMSum` corpus from `t5-large` checkpoint.

More information on the fine-tuning process (includes samples and benchmarks):  
*(currently still WIP, major updates coming soon: 7/6/21~7/9/21)*

## Resource Usage
These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.

| key | value |
| --- | ----- |
| AzureML SKU | ND40rs_v2 (8 X V100 32GB) |
| Region | US West 2 |
| Run Duration | 12m 47.13s |
| Compute Cost (LowPriority/Dedicated) | $0.94/$4.69 (USD) |
| Average CPU Utilization | 51.2% |
| Average GPU Utilization | 42.0% |
| GPU Memory Usage (Avg/Peak) | 24.85/28.79 (GB) |
| Total GPU Energy Usage | 670.38 (kJ) |

*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/  
*Peak memory usage is calculated from average peak across all utilized GPUs.  

### Carbon Emissions
These results are obtained using `codecarbon`. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).  
CodeCarbon: https://github.com/mlco2/codecarbon  

| key | value |
| --- | ----- |
| timestamp | 2021-07-08T06:29:27 |
| duration | 515.5018835067749 |
| emissions | 0.043562840982919106 |
| energy_consumed | 0.14638051405550773 |
| country_name | USA |
| region | Washington |
| cloud_provider | azure |
| cloud_region | westus2 |

## Hyperparameters
```yaml
fp16: True
per device batch size: 8
effective batch size: 64
epoch: 3.0
learning rate: 1e-4
weight decay: 0.1
seed: 1
```
*Same `per device batch size` for evaluations

### DeepSpeed
Optimizer = `AdamW`, Scheduler = `WarmupDecayLR`, Offload = `none`
```json
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 1300000000,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 1300000000,
    "contiguous_gradients": true
  }
```

## Usage
```python
from transformers import pipeline
summarizer = pipeline("summarization", model="henryu-lin/t5-large-samsum-deepspeed")

conversation = '''Kevin: Hey man, are you excited to watch Finding Nemo tonight?
    Henry: Yea, I can't wait to watch that same movie for the 89th time. Is Nate coming over to watch it with us tonight?
    Kevin: Yep, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
    Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. I didn't get to start on it until an hour ago, and it's due in 30 minutes.
    Kevin: Okay dude, you should take it out as soon as possible. By the way, Nate is bringing his girlfriend and their cat too.
    Henry: Nice, I'm really looking forward to seeing them again.
'''
summarizer(conversation)
```

## Results
| ROUGE | Score |
| ----- | ----- |
| eval_rouge1 | 53.0823 |
| eval_rouge2 | 28.7097 |
| eval_rougeL | 43.939 |
| eval_rougeLsum | 49.067 |
| predict_rouge1 | 51.6716 |
| predict_rouge2 | 26.5372 |
| predict_rougeL | 42.9681 |
| predict_rougeLsum | 47.4084 |

| Metric | Value |
| ------ | ----- |
| eval_gen_len | 26.4071 |
| predict_gen_len | 25.9451 |
| train_loss | 1.3212629926497115 |
| eval_loss | 1.23828125 |
| predict_loss | 1.2333984375 |
| train_runtime | 515.2198 |
| train_samples | 14732 |
| train_samples_per_second | 85.781 |
| train_steps_per_second | 1.345 |
| eval_runtime | 61.275 |
| eval_samples | 818 |
| eval_samples_per_second | 13.35 |
| eval_steps_per_second | 0.212 |
| predict_runtime | 63.3732 |
| predict_samples | 819 |
| predict_samples_per_second | 12.923 |
| predict_steps_per_second | 0.205 |
| total_steps | 693 |
| total_flos | 7.20140924616704e+16 |