File size: 5,309 Bytes
d76f9be 8c3bc2c d76f9be 8c3bc2c d76f9be 8c3bc2c d76f9be 0a9f455 8c3bc2c d76f9be 8c3bc2c 8c06ba0 d76f9be 8c3bc2c e42524c 8c3bc2c e42524c 8c06ba0 e42524c 8c06ba0 d76f9be 8c3bc2c 1367ac3 8c3bc2c 17230b7 8c3bc2c 17230b7 e42524c 17230b7 d76f9be e5ce719 ef71046 d76f9be 82668ee d76f9be ef71046 df62d42 d76f9be 786454c d76f9be 6a4d995 d76f9be 386e910 d76f9be 5d33b29 d76f9be d946f2c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: mit
datasets:
- samsum
language:
- en
tags:
- summarization
- text-generation
- toxicity-reduction
- reinforcement-learning
widget:
- text: |+
Summarize the following Conversation:
Kate: Good morning.
Kai: Hi! How official!
Kate: I wrote it at 4am
Kai: I've noticed. Why?
Kate: I had to get up early to catch the bus to the airport
Kai: Where are you flying?
Kate: To Antwerp! I'm fed up with Cambridge
Kai: poor thing. Why?
Kate: Just a stupid, elitist place without a soul. Or with a soul made of money.
Kai: Try to rest a bit in Belgium, do not work too much.
Kate: I have to work, but atleast not in this soulless place.
Kai: When are you coming back?
Kate: I have to see my supervisor on Monday
Kai: not too long a break
Kate: Still better than nothing.
Summary:
example_title: Summarization Example 1
- text: |+
Summarize the following Conversation:
Dean: I feel sick Scott: hungover?
Dean: no, like I ate something bad
Scott: what did you eat yesterday?
Dean: breakfast at Coffee Lovers
Scott: this is a rather safe place
Dean: and Chinese from TaoTao for dinner
Scott: now we have a suspect
Summary:
example_title: Summarization Example 2
pipeline_tag: text2text-generation
inference:
parameters:
max_new_tokens: 50
repetition_penalty: 2.5
top_p: 0.95
top_k: 50
temperature: 0.9
no_repeat_ngram_size: 10
num_return_sequences: 1
do_sample: true
---
# Flan-T5 (base-sized) Dialogue Summarization with reduced toxicity using RLAIF
This model is a **two-fold fine-tuned** [Flan-T5 model](https://huggingface.co/google/flan-t5-base) firstly on the [SAMSUM](https://huggingface.co/datasets/samsum) dataset followed by further fine-tuning using **Reinforcement Learning from AI Feedback(RLAIF)** to detoxify model outputs. <br>
Anthropic's Costitutional AI [paper](https://arxiv.org/abs/2212.08073) from 2022, provides some amazing insights on how RLAIF can be leveraged. Do check out if interested!<br>
More specifically, I've fine-tuned this model on a single downstream task of Dialogue Summarization on the above mentioned dataset with a primary objective of reduced toxicity in generated summaries.
## Model description
This Model has the same architecture and Parameters as its base model. Please refer to this [link](https://arxiv.org/abs/2210.11416) to know more about the model details.
## Intended Use & Limitations
This model is intended to summarize the given dialogue in a way that outputs the less toxic summary even when we pass a dialogue that contains toxic phrases or words.<br>
I've fine-tuned the model with an instruction of `Summarize the following Conversation:` that's prepended at the start of each dialogue followed by `Summary: ` keyword at the end that indicates the start of summary.
Note:
1. The model is primarily trained with an objective of reduced toxicity in the outputs, we can sometimes expect relatively short outputs that might sometimes(rarely) miss the important message in the dialogue but still being true to its primary goal.
2. Currently, HuggingFace doesn't support PEFT model files for Text2Text-Generation Pipeline directly as Hosted Inference API, so please follow the steps mentioned below in the `Usage` section to load and use the model.
## Usage
You can use this model directly to get the summaries:
```python
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load peft config for pre-trained checkpoint etc.
peft_model_id = "DeathReaper0965/flan-t5-samsum-lora-RLAIF-detoxified"
config = PeftConfig.from_pretrained(peft_model_id)
# load base LLM model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, device_map='auto') # If required, you can add `load_in_8bit=True` for loading model in 8-bit
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id, device_map='auto')
input_ids = tokenizer.encode(
"Summarize the following Conversation: Dean: I feel sick Scott: hungover? Dean: no, like I ate something bad Scott: what did you eat yesterday? Dean: breakfast at Coffee Lovers' Scott: this is a rather safe place Dean: and Chinese from TaoTao for dinner Scott: now we have a suspect Summary:",
return_tensors="pt"
).to("cuda" if torch.cuda.is_available() else "cpu")
summary = model.generate(
input_ids = input_ids,
max_new_tokens=256,
repetition_penalty=2.5,
top_p=0.95,
top_k=50,
temperature=0.6,
no_repeat_ngram_size=2,
num_return_sequences=1,
do_sample=True)
output = tokenizer.batch_decode(summary, skip_special_tokens=True)
###########OUTPUT###########
# "Dean ate breakfast at Coffee Lovers' yesterday and Chinese from TaoTao for dinner."
```
> Designed and Developed with <span style="color: #e25555;">♥</span> by [Praneet](https://deathreaper0965.github.io/) | [LinkedIn](http://linkedin.com/in/deathreaper0965) | [GitHub](https://github.com/DeathReaper0965/) |