henryu-lin commited on
Commit
81ca18a
1 Parent(s): 0e43a37

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: en
4
+ tags:
5
+ - azureml
6
+ - t5
7
+ - summarization
8
+ - deepspeed
9
+ license: apache-2.0
10
+ datasets:
11
+ - samsum
12
+ model-index:
13
+ - name: t5-large-samsum-deepspeed
14
+ results:
15
+ - task:
16
+ name: Abstractive Text Summarization
17
+ type: abstractive-text-summarization
18
+ dataset:
19
+ name: "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization"
20
+ type: samsum
21
+ widget:
22
+ - text: |
23
+ Henry: Hey, is Nate coming over to watch the movie tonight?
24
+ Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
25
+ Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
26
+ Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too.
27
+ Henry: Nice, I'm really looking forward to seeing them again.
28
+ ---
29
+
30
+ ## `t5-large-samsum-deepspeed`
31
+ This model was trained using Microsoft's `AzureML` and `DeepSpeed`'s ZeRO 2 optimization. It was fine-tuned on the `SAMSum` corpus from `t5-large` checkpoint.
32
+
33
+ More information on the fine-tuning process (includes samples and benchmarks):
34
+ *(currently still WIP, major updates coming soon: 7/6/21~7/9/21)*
35
+
36
+ ## Resource Usage
37
+ These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters.
38
+
39
+ | key | value |
40
+ | --- | ----- |
41
+ | AzureML SKU | ND40rs_v2 (8 X V100 32GB) |
42
+ | Region | US West 2 |
43
+ | Run Duration | 12m 47.13s |
44
+ | Compute Cost (LowPriority/Dedicated) | $0.94/$4.69 (USD) |
45
+ | Average CPU Utilization | 51.2% |
46
+ | Average GPU Utilization | 42.0% |
47
+ | GPU Memory Usage (Avg/Peak) | 24.85/28.79 (GB) |
48
+ | Total GPU Energy Usage | 670.38 (kJ) |
49
+
50
+ *Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/
51
+ *Peak memory usage is calculated from average peak across all utilized GPUs.
52
+
53
+ ### Carbon Emissions
54
+ These results are obtained using `codecarbon`. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime).
55
+ CodeCarbon: https://github.com/mlco2/codecarbon
56
+
57
+ | key | value |
58
+ | --- | ----- |
59
+ | timestamp | 2021-07-08T06:29:27 |
60
+ | duration | 515.5018835067749 |
61
+ | emissions | 0.043562840982919106 |
62
+ | energy_consumed | 0.14638051405550773 |
63
+ | country_name | USA |
64
+ | region | Washington |
65
+ | cloud_provider | azure |
66
+ | cloud_region | westus2 |
67
+
68
+ ## Hyperparameters
69
+ ```yaml
70
+ fp16: True
71
+ per device batch size: 8
72
+ effective batch size: 64
73
+ epoch: 3.0
74
+ learning rate: 1e-4
75
+ weight decay: 0.1
76
+ seed: 1
77
+ ```
78
+ *Same `per device batch size` for evaluations
79
+
80
+ ### DeepSpeed
81
+ Optimizer = `AdamW`, Scheduler = `WarmupDecayLR`, Offload = `none`
82
+ ```json
83
+ "zero_optimization": {
84
+ "stage": 2,
85
+ "allgather_partitions": true,
86
+ "allgather_bucket_size": 1300000000,
87
+ "overlap_comm": true,
88
+ "reduce_scatter": true,
89
+ "reduce_bucket_size": 1300000000,
90
+ "contiguous_gradients": true
91
+ }
92
+ ```
93
+
94
+ ## Usage
95
+ ```python
96
+ from transformers import pipeline
97
+ summarizer = pipeline("summarization", model="henryu-lin/t5-large-samsum-deepspeed")
98
+
99
+ conversation = '''Henry: Hey, is Nate coming over to watch the movie tonight?
100
+ Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell.
101
+ Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.
102
+ Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too.
103
+ Henry: Nice, I'm really looking forward to seeing them again.
104
+ '''
105
+ summarizer(conversation)
106
+ ```
107
+
108
+ ## Results
109
+ | ROUGE | Score |
110
+ | ----- | ----- |
111
+ | eval_rouge1 | 53.0823 |
112
+ | eval_rouge2 | 28.7097 |
113
+ | eval_rougeL | 43.939 |
114
+ | eval_rougeLsum | 49.067 |
115
+ | predict_rouge1 | 51.6716 |
116
+ | predict_rouge2 | 26.5372 |
117
+ | predict_rougeL | 42.9681 |
118
+ | predict_rougeLsum | 47.4084 |
119
+
120
+ | Metric | Value |
121
+ | ------ | ----- |
122
+ | eval_gen_len | 26.4071 |
123
+ | predict_gen_len | 25.9451 |
124
+ | train_loss | 1.3212629926497115 |
125
+ | eval_loss | 1.23828125 |
126
+ | predict_loss | 1.2333984375 |
127
+ | train_runtime | 515.2198 |
128
+ | train_samples | 14732 |
129
+ | train_samples_per_second | 85.781 |
130
+ | train_steps_per_second | 1.345 |
131
+ | eval_runtime | 61.275 |
132
+ | eval_samples | 818 |
133
+ | eval_samples_per_second | 13.35 |
134
+ | eval_steps_per_second | 0.212 |
135
+ | predict_runtime | 63.3732 |
136
+ | predict_samples | 819 |
137
+ | predict_samples_per_second | 12.923 |
138
+ | predict_steps_per_second | 0.205 |
139
+ | total_steps | 693 |
140
+ | total_flos | 7.20140924616704e+16 |