--- license: mit --- # Overview This model has been fine tuned for text summary creation, and was created using LoRA to fine tunethe flan-t5-large model using the [SAMsum training dataset](https://huggingface.co/datasets/samsum). ## SAMsum SAMsum is a corpus comprised of 16k dialogues and corresponding summaries. Example entry: - Dialogue - "Amanda: I baked cookies. Do you want some? Jerry: Sure! Amanda: I'll bring you tomorrow :-)" - Summary - "Amanda baked cookies and will bring Jerry some tomorrow." ## LoRA [LoRA](https://github.com/microsoft/LoRA) is a performant mechanism for fine tuning models to become better at tasks. > An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. In this case we are training the flan-t5 on the SAMsum dataset in order to create a model that is better at dialog summary. ## Flan T5 Finetuned LAnguage Net Text-to-Text Transfer Transformer (Flan T5) is a LLM published by Google in 2020. This model has improved abilities over the T5 in zero-shot learning. The [flan-t5 model](https://huggingface.co/google/flan-t5-large) is open and free for commercial use. Flan T5 capabilities include: - Translate between several languages (more than 60 languages). - Provide summaries of text. - Answer general questions: “how many minutes should I cook my egg?” - Answer historical questions, and questions related to the future. - Solve math problems when giving the reasoning. > T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing. This means that for training, we always need an input sequence and a corresponding target sequence. The input sequence is fed to the model using input_ids. The target sequence is shifted to the right, i.e., prepended by a start-sequence token and fed to the decoder using the decoder_input_ids. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the labels. The PAD token is hereby used as the start-sequence token. T5 can be trained / fine-tuned both in a supervised and unsupervised fashion. # Code to Create The SAMsum LoRA Adapter ## Notebook Source [Notebook used to create LoRA adapter](https://colab.research.google.com/drive/1z_mZL6CIRRA4AeF6GXe-zpfEGqqdMk-f?usp=sharing) ## Load the samsum dataset that we will use to finetune the flan-t5-large model with. ``` from datasets import load_dataset dataset = load_dataset("samsum") ``` ## Prepare the dataset ``` ... see notebook # save datasets to disk for later easy loading tokenized_dataset["train"].save_to_disk("data/train") tokenized_dataset["test"].save_to_disk("data/eval") ``` ## Load the flan-t5-large model Loading in 8bit greatly reduces the amount of GPU memory required. When combined with the accelerate library, device_map="auto" will use all available gpus for training. ``` from transformers import AutoModelForSeq2SeqLM model_id = "google/flan-t5-large" model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map="auto", torch_dtype=torch.float16) ``` ## Define LoRA config and prepare the model for training ``` from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q", "v"], lora_dropout=0.05, bias="none", task_type=TaskType.SEQ_2_SEQ_LM ) # prepare int-8 model for training model = prepare_model_for_int8_training(model) # add LoRA adaptor model = get_peft_model(model, lora_config) model.print_trainable_parameters() ``` ## Create data collator Data collators are objects that will form a batch by using a list of dataset elements as input. ``` from transformers import DataCollatorForSeq2Seq # we want to ignore tokenizer pad token in the loss label_pad_token_id = -100 # Data collator data_collator = DataCollatorForSeq2Seq( tokenizer, model=model, label_pad_token_id=label_pad_token_id, pad_to_multiple_of=8 ) ``` ## Create the training arguments and trainer ``` from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments output_dir="lora-flan-t5-large" # Define training args training_args = Seq2SeqTrainingArguments( output_dir=output_dir, auto_find_batch_size=True, learning_rate=1e-3, # higher learning rate num_train_epochs=5, logging_dir=f"{output_dir}/logs", logging_strategy="steps", logging_steps=500, save_strategy="no", report_to="tensorboard", ) # Create Trainer instance trainer = Seq2SeqTrainer( model=model, args=training_args, data_collator=data_collator, train_dataset=tokenized_dataset["train"], ) model.config.use_cache = False # re-enable for inference! ``` ## Train the model! This will take about 5-6 hours on a singe T4 GPU ``` trainer.train() ``` | Step | Training Loss | |------|---------------| | 500 | 1.302200 | | 1000 | 1.306300 | | 1500 | 1.341500 | | 2000 | 1.278500 | | 2500 | 1.237000 | | 3000 | 1.239200 | | 3500 | 1.250900 | | 4000 | 1.202100 | | 4500 | 1.165300 | | 5000 | 1.178900 | | 5500 | 1.181700 | | 6000 | 1.100600 | | 6500 | 1.119800 | | 7000 | 1.105700 | | 7500 | 1.097900 | | 8000 | 1.059500 | | 8500 | 1.047400 | | 9000 | 1.046100 | TrainOutput(global_step=9210, training_loss=1.1780610539108094, metrics={'train_runtime': 19217.7668, 'train_samples_per_second': 3.833, 'train_steps_per_second': 0.479, 'total_flos': 8.541847343333376e+16, 'train_loss': 1.1780610539108094, 'epoch': 5.0}) ## Save the model to disk, zip, and download ``` peft_model_id="flan-t5-large-samsum" trainer.model.save_pretrained(peft_model_id) tokenizer.save_pretrained(peft_model_id) trainer.model.base_model.save_pretrained(peft_model_id) !zip -r /content/flan-t5-large-samsum.zip /content/flan-t5-large-samsum from google.colab import files files.download("/content/flan-t5-large-samsum.zip") ``` Upload the contents of that zip file to huggingface