File size: 2,545 Bytes
507700d db9388f 507700d db9388f 507700d db9388f 507700d e627428 ca09afa 507700d ca09afa 507700d 4f277ae 507700d 027127a fee5345 507700d 4b7ec38 db9388f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
language:
- en
tags:
- formality
datasets:
- GYAFC
- Pavlick-Tetreault-2016
license: cc-by-nc-sa-4.0
---
The model has been trained to predict for English sentences, whether they are formal or informal.
Base model: `roberta-base`
Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005).
Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features.
Loss: binary classification (on GYAFC), in-batch ranking (on PT data).
Performance metrics on the test data:
| dataset | ROC AUC | precision | recall | fscore | accuracy | Spearman |
|----------------------------------------------|---------|-----------|--------|--------|----------|------------|
| GYAFC | 0.9779 | 0.90 | 0.91 | 0.90 | 0.9087 | 0.8233 |
| GYAFC normalized (lowercase + remove punct.) | 0.9234 | 0.85 | 0.81 | 0.82 | 0.8218 | 0.7294 |
| P&T subset | Spearman R |
| - | - |
news | 0.4003
answers | 0.7500
blog | 0.7334
email | 0.7606
## Citation
If you are using the model in your research, please cite the following
[paper](https://doi.org/10.1007/978-3-031-35320-8_4) where it was introduced:
```
@InProceedings{10.1007/978-3-031-35320-8_4,
author="Babakov, Nikolay
and Dale, David
and Gusev, Ilya
and Krotova, Irina
and Panchenko, Alexander",
editor="M{\'e}tais, Elisabeth
and Meziane, Farid
and Sugumaran, Vijayan
and Manning, Warren
and Reiff-Marganiec, Stephan",
title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer",
booktitle="Natural Language Processing and Information Systems",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="47--61",
isbn="978-3-031-35320-8"
}
```
## Licensing Information
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png |