--- datasets: - IteraTeR_full_sent --- # IteraTeR PEGASUS model This model was obtained by fine-tuning [google/pegasus-large](https://huggingface.co/google/pegasus-large) on [IteraTeR-full-sent](https://huggingface.co/datasets/wanyu/IteraTeR_full_sent) dataset. Paper: [Understanding Iterative Revision from Human-Written Text](https://arxiv.org/abs/2203.03802)
Authors: Wanyu Du, Vipul Raheja, Dhruv Kumar, Zae Myung Kim, Melissa Lopez, Dongyeop Kang ## Text Revision Task Given an edit intention and an original sentence, our model can generate a revised sentence.
The edit intentions are provided by [IteraTeR-full-sent](https://huggingface.co/datasets/wanyu/IteraTeR_full_sent) dataset, which are categorized as follows:
Edit Intention Definition Example
clarity Make the text more formal, concise, readable and understandable. Original: It's like a house which anyone can enter in it.
Revised: It's like a house which anyone can enter.
fluency Fix grammatical errors in the text. Original: In the same year he became the Fellow of the Royal Society.
Revised: In the same year, he became the Fellow of the Royal Society.
coherence Make the text more cohesive, logically linked and consistent as a whole. Original: Achievements and awards Among his other activities, he founded the Karachi Film Guild and Pakistan Film and TV Academy.
Revised: Among his other activities, he founded the Karachi Film Guild and Pakistan Film and TV Academy.
style Convey the writer’s writing preferences, including emotions, tone, voice, etc.. Original: She was last seen on 2005-10-22.
Revised: She was last seen on October 22, 2005.
## Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator") model = AutoModelForSeq2SeqLM.from_pretrained("wanyu/IteraTeR-PEGASUS-Revision-Generator") before_input = ' I likes coffee.' model_input = tokenizer(before_input, return_tensors='pt') model_outputs = model.generate(**model_input, num_beams=8, max_length=1024) after_text = tokenizer.batch_decode(model_outputs, skip_special_tokens=True)[0] ```