--- language: en tags: - azureml - t5 - summarization - deepspeed license: apache-2.0 datasets: - samsum model-index: - name: t5-3b-samsum-deepspeed results: - task: name: Abstractive Text Summarization type: abstractive-text-summarization dataset: name: "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization" type: samsum widget: - text: | Henry: Hey, is Nate coming over to watch the movie tonight? Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell. Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too. Henry: Nice, I'm really looking forward to seeing them again. --- ## `t5-3b-samsum-deepspeed` This model was trained using Microsoft's `AzureML` and `DeepSpeed`'s ZeRO 2 optimization. It was fine-tuned on the `SAMSum` corpus from `t5-3b` checkpoint. More information on the fine-tuning process (includes samples and benchmarks): *(currently still WIP, updates coming soon: 7/6/21~7/9/21)* ## Resource Usage These results are retrieved from AzureML Studio's resource monitoring module. All experiments were ran on AzureML's low priority clusters. | key | value | | --- | ----- | | AzureML SKU | ND40rs_v2 (8 X V100 32GB) | | Region | US West 2 | | Run Duration | 43m 51.05s | | Compute Cost (LowPriority/Dedicated) | $3.22/$16.10 (USD) | | Average CPU Utilization | 46.0% | | Average GPU Utilization | 56.9% | | GPU Memory Usage (Avg/Peak) | 26.77/30.49 (GB) | | Total GPU Energy Usage | 2448.69 (kJ) | *Compute cost is calculated from run duration and SKU's price per hour. Updated SKU pricing could be found here: https://azure.microsoft.com/en-us/pricing/details/machine-learning/ *Peak memory usage is calculated from average peak across all utilized GPUs. ### Carbon Emissions These results are obtained using `codecarbon`. The carbon emission is estimated from training runtime only (excluding setup and evaluation runtime). CodeCarbon: https://github.com/mlco2/codecarbon | key | value | | --- | ----- | | timestamp | 2021-07-06T21:57:39 | | duration | 1841.4621863365173 | | emissions | 0.17802492531467784 | | energy_consumed | 0.5982020339874927 | | country_name | USA | | region | Washington | | cloud_provider | azure | | cloud_region | westus2 | ## Hyperparameters ```yaml fp16: True per device batch size: 2 effective batch size: 16 epoch: 3.0 learning rate: 3e-5 weight decay: 0.0 seed: 1 ``` *Same `per device batch size` for evaluations ### DeepSpeed Optimizer = `AdamW`, Scheduler = `WarmupDecayLR`, Offload = `none` ```json "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 1000000000, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 1000000000, "contiguous_gradients": true } ``` ## Usage ```python from transformers import pipeline summarizer = pipeline("summarization", model="henryu-lin/t5-3b-samsum-deepspeed") conversation = '''Henry: Hey, is Nate coming over to watch the movie tonight? Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet? It's starting to make the kitchen really smell. Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class. Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend too. Henry: Nice, I'm really looking forward to seeing them again. ''' summarizer(conversation) ``` ## Results | ROUGE | Score | | ----- | ----- | | eval_rouge1 | 54.7875 | | eval_rouge2 | 30.565 | | eval_rougeL | 45.7625 | | eval_rougeLsum | 50.3915 | | predict_rouge1 | 53.6628 | | predict_rouge2 | 29.0196 | | predict_rougeL | 45.1257 | | predict_rougeLsum | 49.171 | | Metric | Value | | ------ | ----- | | eval_gen_len | 25.3399 | | predict_gen_len | 24.9133 | | train_loss | 1.1206104169494209 | | eval_loss | 1.0732421875 | | predict_loss | 1.087890625 | | train_runtime | 1841.3751 | | train_samples | 14732 | | train_samples_per_second | 24.002 | | train_steps_per_second | 1.501 | | eval_runtime | 163.8357 | | eval_samples | 818 | | eval_samples_per_second | 4.993 | | eval_steps_per_second | 0.317 | | predict_runtime | 168.8245 | | predict_samples | 819 | | predict_samples_per_second | 4.851 | | predict_steps_per_second | 0.308 | | total_steps | 2763 | | total_flos | 1.84452086400811e+17 |