|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- humarin/chatgpt-paraphrases |
|
language: |
|
- en |
|
tags: |
|
- paraphrase |
|
- similar text |
|
--- |
|
This model re-fine-tunes the [ChatGPT Paraphraser on T5 Base](https://huggingface.co/humarin/chatgpt_paraphraser_on_T5_base) with additional Google PAWS dataset. |
|
|
|
## Usage example |
|
```python |
|
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
#'cuda' for gpu otherwise use 'cpu' |
|
device = "cuda" |
|
model = AutoModelForSeq2SeqLM.from_pretrained("sharad/ParaphraseGPT").to(device) |
|
tokenizer = AutoTokenizer.from_pretrained("humarin/chatgpt_paraphraser_on_T5_base") |
|
predict = pipeline("text2text-generation", model=model, tokenizer=tokenizer) |
|
|
|
def paraphrase(sentence): |
|
generated = predict( |
|
sentence, |
|
num_beams=3, |
|
num_beam_groups=3, |
|
num_return_sequences=1, |
|
diversity_penalty=2.0, |
|
no_repeat_ngram_size=2, |
|
repetition_penalty=0.99, |
|
max_length=len(sentence) |
|
) |
|
return generated |
|
|
|
output = paraphrase('My sentence to paraphrase...') |
|
print(output[0]['generated_text']) |
|
``` |
|
|
|
## Train parameters |
|
```python |
|
epochs = 4 |
|
max_length = 128 |
|
lr = 5e-5 |
|
``` |