kalpeshk2011 commited on
Commit
75958be
1 Parent(s): 39161e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -34,7 +34,7 @@ DIPPER ("**Di**scourse **P**ara**p**hras**er**") is a 11B parameter paraphrase g
34
 
35
  We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser.
36
 
37
- ## Using DIPPER
38
 
39
  Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper
40
 
@@ -72,8 +72,7 @@ class DipperParaphraser(object):
72
 
73
  for sent_idx in range(0, len(sentences), sent_interval):
74
  curr_sent_window = " ".join(sentences[sent_idx:sent_idx + sent_interval])
75
- final_input_text = f"lexical = {lex_code}, order = {order_code}"
76
- final_input_text += f" <sent> {curr_sent_window} </sent>"
77
 
78
  final_input = self.tokenizer([final_input_text], return_tensors="pt")
79
  final_input = {k: v.cuda() for k, v in final_input.items()}
 
34
 
35
  We leverage the PAR3 dataset publicly released by Thai et al. (2022) to train DIPPER. This dataset contains multiple translations of non-English novels into English aligned at a paragraph level (e.g., it contains both the Henry Morley and Robert Adams translations of Voltaire’s Candide), which we treat as paragraphlevel paraphrases and use to train our paraphraser.
36
 
37
+ ## Using DIPPER (no-context)
38
 
39
  Full instructions: https://github.com/martiansideofthemoon/ai-detection-paraphrases#running-the-paraphraser-model-dipper
40
 
 
72
 
73
  for sent_idx in range(0, len(sentences), sent_interval):
74
  curr_sent_window = " ".join(sentences[sent_idx:sent_idx + sent_interval])
75
+ final_input_text = f"lexical = {lex_code}, order = {order_code} {curr_sent_window}"
 
76
 
77
  final_input = self.tokenizer([final_input_text], return_tensors="pt")
78
  final_input = {k: v.cuda() for k, v in final_input.items()}