File size: 2,545 Bytes
507700d
db9388f
 
507700d
db9388f
507700d
db9388f
 
 
507700d
 
e627428
ca09afa
 
 
 
 
507700d
ca09afa
 
507700d
4f277ae
507700d
027127a
fee5345
 
 
507700d
 
 
 
 
 
 
4b7ec38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db9388f
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
language:
- en
tags:
- formality
datasets:
- GYAFC
- Pavlick-Tetreault-2016
license: cc-by-nc-sa-4.0
---

The model has been trained to predict for English sentences, whether they are formal or informal. 

Base model: `roberta-base`

Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005).

Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features.

Loss: binary classification (on GYAFC), in-batch ranking (on PT data).

Performance metrics on the test data:

| dataset                                      | ROC AUC | precision | recall | fscore | accuracy | Spearman |
|----------------------------------------------|---------|-----------|--------|--------|----------|------------|
| GYAFC                                        | 0.9779  | 0.90      | 0.91   | 0.90   | 0.9087   | 0.8233     |
| GYAFC normalized (lowercase + remove punct.) | 0.9234  | 0.85      | 0.81   | 0.82   | 0.8218   | 0.7294     |

| P&T subset | Spearman R |
| -     | - |
news    |	 0.4003
answers |	 0.7500
blog    |	 0.7334
email   |	 0.7606

## Citation
If you are using the model in your research, please cite the following 
[paper](https://doi.org/10.1007/978-3-031-35320-8_4) where it was introduced:
```
@InProceedings{10.1007/978-3-031-35320-8_4,
  author="Babakov, Nikolay
  and Dale, David
  and Gusev, Ilya
  and Krotova, Irina
  and Panchenko, Alexander",
  editor="M{\'e}tais, Elisabeth
  and Meziane, Farid
  and Sugumaran, Vijayan
  and Manning, Warren
  and Reiff-Marganiec, Stephan",
  title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer",
  booktitle="Natural Language Processing and Information Systems",
  year="2023",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="47--61",
  isbn="978-3-031-35320-8"
}
```

## Licensing Information

[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png