t5-small-tapaco / README.md
1
---
2
language: en
3
datasets:
4
- tapaco
5
---
6
# T5-small for paraphrase generation
7
8
Google's T5 small fine-tuned on [TaPaCo](https://huggingface.co/datasets/tapaco) dataset for paraphrasing.
9
10
## Model in Action 🚀
11
12
```python
13
from transformers import T5ForConditionalGeneration, T5Tokenizer
14
15
tokenizer = T5Tokenizer.from_pretrained("hetpandya/t5-small-tapaco")
16
model = T5ForConditionalGeneration.from_pretrained("hetpandya/t5-small-tapaco")
17
18
def get_paraphrases(sentence, prefix="paraphrase: ", n_predictions=5, top_k=120, max_length=256,device="cpu"):
19
        text = prefix + sentence + " </s>"
20
        encoding = tokenizer.encode_plus(
21
            text, pad_to_max_length=True, return_tensors="pt"
22
        )
23
        input_ids, attention_masks = encoding["input_ids"].to(device), encoding[
24
            "attention_mask"
25
        ].to(device)
26
27
        model_output = model.generate(
28
            input_ids=input_ids,
29
            attention_mask=attention_masks,
30
            do_sample=True,
31
            max_length=max_length,
32
            top_k=top_k,
33
            top_p=0.98,
34
            early_stopping=True,
35
            num_return_sequences=n_predictions,
36
        )
37
38
        outputs = []
39
        for output in model_output:
40
            generated_sent = tokenizer.decode(
41
                output, skip_special_tokens=True, clean_up_tokenization_spaces=True
42
            )
43
            if (
44
                generated_sent.lower() != sentence.lower()
45
                and generated_sent not in outputs
46
            ):
47
                outputs.append(generated_sent)
48
        return outputs
49
50
paraphrases = get_paraphrases("The house will be cleaned by me every Saturday.")
51
52
for sent in paraphrases:
53
  print(sent)
54
```
55
56
## Output
57
```
58
The house is cleaned every Saturday by me.
59
The house will be cleaned on Saturday.
60
I will clean the house every Saturday.
61
I get the house cleaned every Saturday.
62
I will clean this house every Saturday.
63
```
64
65
## Model fine-tuning
66
Please find my guide on fine-tuning the model here:
67
68
https://towardsdatascience.com/training-t5-for-paraphrase-generation-ab3b5be151a2
69
70
71
Created by [Het Pandya/@hetpandya](https://github.com/hetpandya) | [LinkedIn](https://www.linkedin.com/in/het-pandya)
72
73
Made with <span style="color: red;">&hearts;</span> in India