File size: 2,520 Bytes
7104988 ce16dea f741f23 ce16dea 3e88c31 ce16dea 3e88c31 ce16dea 3e88c31 ce16dea 3e88c31 ce16dea 7104988 c684f1c 7104988 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
datasets:
- IteraTeR_full_sent
---
# IteraTeR PEGASUS model
This model was obtained by fine-tuning [google/pegasus-large](https://huggingface.co/google/pegasus-large) on [IteraTeR-full-sent](https://huggingface.co/datasets/wanyu/IteraTeR_full_sent) dataset.
Paper: [Understanding Iterative Revision from Human-Written Text](https://arxiv.org/abs/2203.03802) <br>
Authors: Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, Dongyeop Kang
## Text Revision Task
Given an edit intention and an original sentence, our model can generate a revised sentence.<br>
The edit intentions are provided by [IteraTeR-full-sent](https://huggingface.co/datasets/wanyu/IteraTeR_full_sent) dataset, which are categorized as follows:
<table>
<tr>
<th>Edit Intention</th>
<th>Definition</th>
<th>Example</th>
</tr>
<tr>
<td>clarity</td>
<td>Make the text more formal, concise, readable and understandable.</td>
<td>
Original: It's like a house which anyone can enter in it. <br>
Revised: It's like a house which anyone can enter.
</td>
</tr>
<tr>
<td>fluency</td>
<td>Fix grammatical errors in the text.</td>
<td>
Original: In the same year he became the Fellow of the Royal Society. <br>
Revised: In the same year, he became the Fellow of the Royal Society.
</td>
</tr>
<tr>
<td>coherence</td>
<td>Make the text more cohesive, logically linked and consistent as a whole.</td>
<td>
Original: Achievements and awards Among his other activities, he founded the Karachi Film Guild and Pakistan Film and TV Academy. <br>
Revised: Among his other activities, he founded the Karachi Film Guild and Pakistan Film and TV Academy.
</td>
</tr>
<tr>
<td>style</td>
<td>Convey the writer’s writing preferences, including emotions, tone, voice, etc..</td>
<td>
Original: She was last seen on 2005-10-22. <br>
Revised: She was last seen on October 22, 2005.
</td>
</tr>
</table>
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator")
model = AutoModelForSeq2SeqLM.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator")
before_input = '<fluency> I likes coffee.'
model_input = tokenizer(before_input, return_tensors='pt')
model_outputs = model.generate(**model_input, num_beams=8, max_length=1024)
after_text = tokenizer.batch_decode(model_outputs, skip_special_tokens=True)[0]
``` |