cointegrated's picture
Create README.md
ca09afa

The model has been trained here to predict for English sentences, whether they are formal or informal.

Base model: roberta-base

Datasets: GYAFC from Rao and Tetreault, 2018 and online formality corpus from Pavlick and Tetreault, 2016.

Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence.

Loss: binary classification (on GYAFC), in-batch ranking (on PT data).