metadata
license: apache-2.0
datasets:
- humarin/chatgpt-paraphrases
language:
- en
tags:
- paraphrase
- similar text
This model re-fine-tunes the ChatGPT Paraphraser on T5 Base with additional Google PAWS dataset.
Usage example
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
#'cuda' for gpu otherwise use 'cpu'
device = "cuda"
model = AutoModelForSeq2SeqLM.from_pretrained("sharad/ParaphraseGPT").to(device)
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base")
predict = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
def paraphrase(sentence):
generated = predict(
sentence,
num_beams=3,
num_beam_groups=3,
num_return_sequences=1,
diversity_penalty=2.0,
no_repeat_ngram_size=2,
repetition_penalty=0.99,
max_length=len(sentence)
)
return generated
output = paraphrase('My sentence to paraphrase...')
print(output[0]['generated_text'])
Train parameters
epochs = 4
max_length = 128
lr = 5e-5