Commit
·
81ca18a
1
Parent(s):
0e43a37
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,140 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
language: en
|
4 |
+
tags:
|
5 |
+
- azureml
|
6 |
+
- t5
|
7 |
+
- summarization
|
8 |
+
- deepspeed
|
9 |
+
license: apache-2.0
|
10 |
+
datasets:
|
11 |
+
- samsum
|
12 |
+
model-index:
|
13 |
+
- name: t5-large-samsum-deepspeed
|
14 |
+
results:
|
15 |
+
- task:
|
16 |
+
name: Abstractive Text Summarization
|
17 |
+
type: abstractive-text-summarization
|
18 |
+
dataset:
|
19 |
+
name: "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization"
|
20 |
+
type: samsum
|
21 |
+
widget:
|
22 |
+
- text: |
|
23 |
+
Henry: Hey, is Nate coming over to watch the movie tonight?
|
24 |
+
Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
|
25 |
+
Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
|
26 |
+
Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too.
|
27 |
+
Henry: Nice, I'm really looking forward to seeing them again.
|
28 |
+
---
|
29 |
+
|
30 |
+
## `t5-large-samsum-deepspeed`
|
31 |
+
This model was trained using Microsoft's `AzureML` and `DeepSpeed`'s ZeRO 2 optimization. It was fine-tuned on the `SAMSum` corpus from `t5-large` checkpoint.
|
32 |
+
|
33 |
+
More information on the fine-tuning process (includes samples and benchmarks):
|
34 |
+
*(currently still WIP, major updates coming soon: 7/6/21~7/9/21)*
|
35 |
+
|
36 |
+
## Resource Usage
|
37 |
+
These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.
|
38 |
+
|
39 |
+
| key | value |
|
40 |
+
| --- | ----- |
|
41 |
+
| AzureML SKU | ND40rs_v2 (8 X V100 32GB) |
|
42 |
+
| Region | US West 2 |
|
43 |
+
| Run Duration | 12m 47.13s |
|
44 |
+
| Compute Cost (LowPriority/Dedicated) | $0.94/$4.69 (USD) |
|
45 |
+
| Average CPU Utilization | 51.2% |
|
46 |
+
| Average GPU Utilization | 42.0% |
|
47 |
+
| GPU Memory Usage (Avg/Peak) | 24.85/28.79 (GB) |
|
48 |
+
| Total GPU Energy Usage | 670.38 (kJ) |
|
49 |
+
|
50 |
+
*Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/
|
51 |
+
*Peak memory usage is calculated from average peak across all utilized GPUs.
|
52 |
+
|
53 |
+
### Carbon Emissions
|
54 |
+
These results are obtained using `codecarbon`. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).
|
55 |
+
CodeCarbon: https://github.com/mlco2/codecarbon
|
56 |
+
|
57 |
+
| key | value |
|
58 |
+
| --- | ----- |
|
59 |
+
| timestamp | 2021-07-08T06:29:27 |
|
60 |
+
| duration | 515.5018835067749 |
|
61 |
+
| emissions | 0.043562840982919106 |
|
62 |
+
| energy_consumed | 0.14638051405550773 |
|
63 |
+
| country_name | USA |
|
64 |
+
| region | Washington |
|
65 |
+
| cloud_provider | azure |
|
66 |
+
| cloud_region | westus2 |
|
67 |
+
|
68 |
+
## Hyperparameters
|
69 |
+
```yaml
|
70 |
+
fp16: True
|
71 |
+
per device batch size: 8
|
72 |
+
effective batch size: 64
|
73 |
+
epoch: 3.0
|
74 |
+
learning rate: 1e-4
|
75 |
+
weight decay: 0.1
|
76 |
+
seed: 1
|
77 |
+
```
|
78 |
+
*Same `per device batch size` for evaluations
|
79 |
+
|
80 |
+
### DeepSpeed
|
81 |
+
Optimizer = `AdamW`, Scheduler = `WarmupDecayLR`, Offload = `none`
|
82 |
+
```json
|
83 |
+
"zero_optimization": {
|
84 |
+
"stage": 2,
|
85 |
+
"allgather_partitions": true,
|
86 |
+
"allgather_bucket_size": 1300000000,
|
87 |
+
"overlap_comm": true,
|
88 |
+
"reduce_scatter": true,
|
89 |
+
"reduce_bucket_size": 1300000000,
|
90 |
+
"contiguous_gradients": true
|
91 |
+
}
|
92 |
+
```
|
93 |
+
|
94 |
+
## Usage
|
95 |
+
```python
|
96 |
+
from transformers import pipeline
|
97 |
+
summarizer = pipeline("summarization", model="henryu-lin/t5-large-samsum-deepspeed")
|
98 |
+
|
99 |
+
conversation = '''Henry: Hey, is Nate coming over to watch the movie tonight?
|
100 |
+
Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
|
101 |
+
Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
|
102 |
+
Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too.
|
103 |
+
Henry: Nice, I'm really looking forward to seeing them again.
|
104 |
+
'''
|
105 |
+
summarizer(conversation)
|
106 |
+
```
|
107 |
+
|
108 |
+
## Results
|
109 |
+
| ROUGE | Score |
|
110 |
+
| ----- | ----- |
|
111 |
+
| eval_rouge1 | 53.0823 |
|
112 |
+
| eval_rouge2 | 28.7097 |
|
113 |
+
| eval_rougeL | 43.939 |
|
114 |
+
| eval_rougeLsum | 49.067 |
|
115 |
+
| predict_rouge1 | 51.6716 |
|
116 |
+
| predict_rouge2 | 26.5372 |
|
117 |
+
| predict_rougeL | 42.9681 |
|
118 |
+
| predict_rougeLsum | 47.4084 |
|
119 |
+
|
120 |
+
| Metric | Value |
|
121 |
+
| ------ | ----- |
|
122 |
+
| eval_gen_len | 26.4071 |
|
123 |
+
| predict_gen_len | 25.9451 |
|
124 |
+
| train_loss | 1.3212629926497115 |
|
125 |
+
| eval_loss | 1.23828125 |
|
126 |
+
| predict_loss | 1.2333984375 |
|
127 |
+
| train_runtime | 515.2198 |
|
128 |
+
| train_samples | 14732 |
|
129 |
+
| train_samples_per_second | 85.781 |
|
130 |
+
| train_steps_per_second | 1.345 |
|
131 |
+
| eval_runtime | 61.275 |
|
132 |
+
| eval_samples | 818 |
|
133 |
+
| eval_samples_per_second | 13.35 |
|
134 |
+
| eval_steps_per_second | 0.212 |
|
135 |
+
| predict_runtime | 63.3732 |
|
136 |
+
| predict_samples | 819 |
|
137 |
+
| predict_samples_per_second | 12.923 |
|
138 |
+
| predict_steps_per_second | 0.205 |
|
139 |
+
| total_steps | 693 |
|
140 |
+
| total_flos | 7.20140924616704e+16 |
|