--- language: - en tags: - formality datasets: - GYAFC - Pavlick-Tetreault-2016 license: cc-by-nc-sa-4.0 --- The model has been trained to predict for English sentences, whether they are formal or informal. Base model: `roberta-base` Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005). Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features. Loss: binary classification (on GYAFC), in-batch ranking (on PT data). Performance metrics on the test data: | dataset | ROC AUC | precision | recall | fscore | accuracy | Spearman | |----------------------------------------------|---------|-----------|--------|--------|----------|------------| | GYAFC | 0.9779 | 0.90 | 0.91 | 0.90 | 0.9087 | 0.8233 | | GYAFC normalized (lowercase + remove punct.) | 0.9234 | 0.85 | 0.81 | 0.82 | 0.8218 | 0.7294 | | P&T subset | Spearman R | | - | - | news | 0.4003 answers | 0.7500 blog | 0.7334 email | 0.7606 ## Citation If you are using the model in your research, please cite the following [paper](https://doi.org/10.1007/978-3-031-35320-8_4) where it was introduced: ``` @InProceedings{10.1007/978-3-031-35320-8_4, author="Babakov, Nikolay and Dale, David and Gusev, Ilya and Krotova, Irina and Panchenko, Alexander", editor="M{\'e}tais, Elisabeth and Meziane, Farid and Sugumaran, Vijayan and Manning, Warren and Reiff-Marganiec, Stephan", title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer", booktitle="Natural Language Processing and Information Systems", year="2023", publisher="Springer Nature Switzerland", address="Cham", pages="47--61", isbn="978-3-031-35320-8" } ``` ## Licensing Information [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ [cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png