---
license: mit
---
# Overview
This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum).

## SAMsum
SAMsum is a corpus comprised of 16k dialogues and corresponding summaries.

Example entry:
- Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)"
- Summary - "Amanda baked cookies and will bring Jerry some tomorrow."

## LoRA
[LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks.  
> An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. 

In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary.

## Flan T5
Finetuned LAnguage Net Text-to-Text Transfer Transformer (Flan T5) is a LLM published by Google in 2020. This model has improved abilities over the T5 in zero-shot learning.
The [flan-t5 model](https://huggingface.co/google/flan-t5-large) is open and free for commercial use.

Flan T5 capabilities include:
- Translate between several languages (more than 60 languages).
- Provide summaries of text.
- Answer general questions: “how many minutes should I cook my egg?”
- Answer historical questions, and questions related to the future.
- Solve math problems when giving the reasoning. 

> T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. This means that for training, we always need an input sequence and a corresponding target sequence. The input sequence is fed to the model using input_ids. The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the decoder_input_ids. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the labels. The PAD token is hereby used as the start-sequence token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.

# Code to Create The SAMsum LoRA Adapter

## Notebook Source
[Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing)

## Load the samsum dataset that we will use to finetune the flan-t5-large model with.
```
from datasets import load_dataset
dataset = load_dataset("samsum")
```

## Prepare the dataset
```
... see notebook
# save datasets to disk for later easy loading
tokenized_dataset["train"].save_to_disk("data/train")
tokenized_dataset["test"].save_to_disk("data/eval")
```

## Load the flan-t5-large model
Loading in 8bit greatly reduces the amount of GPU memory required.

When combined with the accelerate library, device_map="auto" will use all available gpus for training.
```
from transformers import AutoModelForSeq2SeqLM
model_id = "google/flan-t5-large"
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map="auto", torch_dtype=torch.float16)
```

## Define LoRA config and prepare the model for training
```
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType
lora_config = LoraConfig(
 r=16, 
 lora_alpha=32,
 target_modules=["q", "v"],
 lora_dropout=0.05,
 bias="none",
 task_type=TaskType.SEQ_2_SEQ_LM
)
# prepare int-8 model for training
model = prepare_model_for_int8_training(model)

# add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
```
## Create data collator
Data collators are objects that will form a batch by using a list of dataset elements as input. 
```
from transformers import DataCollatorForSeq2Seq

# we want to ignore tokenizer pad token in the loss
label_pad_token_id = -100
# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=model,
    label_pad_token_id=label_pad_token_id,
    pad_to_multiple_of=8
)
```

## Create the training arguments and trainer
```
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

output_dir="lora-flan-t5-large"

# Define training args
training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,
		auto_find_batch_size=True,
    learning_rate=1e-3, # higher learning rate
    num_train_epochs=5,
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=500,
    save_strategy="no",
    report_to="tensorboard",
)

# Create Trainer instance
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_dataset["train"],
)
model.config.use_cache = False  # re-enable for inference!
```

## Train the model!
This will take about 5-6 hours on a singe T4 GPU
```
trainer.train()
```
| Step | Training Loss |
|------|---------------|
| 500  | 1.302200      |
| 1000 | 1.306300      |
| 1500 | 1.341500      |
| 2000 | 1.278500      |
| 2500 | 1.237000      |
| 3000 | 1.239200      |
| 3500 | 1.250900      |
| 4000 | 1.202100      |
| 4500 | 1.165300      |
| 5000 | 1.178900      |
| 5500 | 1.181700      |
| 6000 | 1.100600      |
| 6500 | 1.119800      |
| 7000 | 1.105700      |
| 7500 | 1.097900      |
| 8000 | 1.059500      |
| 8500 | 1.047400      |
| 9000 | 1.046100      |

TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0})

## Save the model to disk, zip, and download
```
peft_model_id="flan-t5-large-samsum"
trainer.model.save_pretrained(peft_model_id)
tokenizer.save_pretrained(peft_model_id)
trainer.model.base_model.save_pretrained(peft_model_id)

!zip -r /content/flan-t5-large-samsum.zip /content/flan-t5-large-samsum

from google.colab import files
files.download("/content/flan-t5-large-samsum.zip")
```

Upload the contents of that zip file to huggingface