julien-c HF staff commited on
Commit
d78f774
1 Parent(s): 75108e9

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/ramsrigouthamg/t5_paraphraser/README.md

Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Model in Action 🚀
2
+
3
+ ```python
4
+ import torch
5
+ from transformers import T5ForConditionalGeneration,T5Tokenizer
6
+
7
+
8
+ def set_seed(seed):
9
+ torch.manual_seed(seed)
10
+ if torch.cuda.is_available():
11
+ torch.cuda.manual_seed_all(seed)
12
+
13
+ set_seed(42)
14
+
15
+ model = T5ForConditionalGeneration.from_pretrained('ramsrigouthamg/t5_paraphraser')
16
+ tokenizer = T5Tokenizer.from_pretrained('ramsrigouthamg/t5_paraphraser')
17
+
18
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
19
+ print ("device ",device)
20
+ model = model.to(device)
21
+
22
+ sentence = "Which course should I take to get started in data science?"
23
+ # sentence = "What are the ingredients required to bake a perfect cake?"
24
+ # sentence = "What is the best possible approach to learn aeronautical engineering?"
25
+ # sentence = "Do apples taste better than oranges in general?"
26
+
27
+
28
+ text = "paraphrase: " + sentence + " </s>"
29
+
30
+
31
+ max_len = 256
32
+
33
+ encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
34
+ input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)
35
+
36
+
37
+ # set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
38
+ beam_outputs = model.generate(
39
+ input_ids=input_ids, attention_mask=attention_masks,
40
+ do_sample=True,
41
+ max_length=256,
42
+ top_k=120,
43
+ top_p=0.98,
44
+ early_stopping=True,
45
+ num_return_sequences=10
46
+ )
47
+
48
+
49
+ print ("\nOriginal Question ::")
50
+ print (sentence)
51
+ print ("\n")
52
+ print ("Paraphrased Questions :: ")
53
+ final_outputs =[]
54
+ for beam_output in beam_outputs:
55
+ sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
56
+ if sent.lower() != sentence.lower() and sent not in final_outputs:
57
+ final_outputs.append(sent)
58
+
59
+ for i, final_output in enumerate(final_outputs):
60
+ print("{}: {}".format(i, final_output))
61
+
62
+ ```
63
+ ## Output
64
+ ```
65
+ Original Question ::
66
+ Which course should I take to get started in data science?
67
+
68
+
69
+ Paraphrased Questions ::
70
+ 0: What should I learn to become a data scientist?
71
+ 1: How do I get started with data science?
72
+ 2: How would you start a data science career?
73
+ 3: How can I start learning data science?
74
+ 4: How do you get started in data science?
75
+ 5: What's the best course for data science?
76
+ 6: Which course should I start with for data science?
77
+ 7: What courses should I follow to get started in data science?
78
+ 8: What degree should be taken by a data scientist?
79
+ 9: Which course should I follow to become a Data Scientist?
80
+ ```
81
+
82
+ ## Detailed blog post available here :
83
+ https://towardsdatascience.com/paraphrase-any-question-with-t5-text-to-text-transfer-transformer-pretrained-model-and-cbb9e35f1555
84
+