s-nlp
/

roberta-base-formality-ranker

Text Classification

Inference Endpoints

Model card Files Files and versions Community

cointegrated commited on Jul 20, 2022

Commit

507700d

•

1 Parent(s): ca09afa

Update README.md

Files changed (1) hide show

README.md +25 -1

README.md CHANGED Viewed

@@ -1,9 +1,33 @@
 The model has been trained [here](https://git.mts.ai/ai/ml_lab/skoltech-nlp_lab/skoltech/task_oriented_TST/-/blob/main/transfer/formality_ranker_v1.ipynb) to predict for English sentences, whether they are formal or informal.
 Base model: `roberta-base`
 Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005).
-Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence.
 Loss: binary classification (on GYAFC), in-batch ranking (on PT data).

+---
+language:
+  - en
+tags:
+  - formality
+datasets:
+  - GYAFC
+  - Pavlick-Tetreault-2016
+---
 The model has been trained [here](https://git.mts.ai/ai/ml_lab/skoltech-nlp_lab/skoltech/task_oriented_TST/-/blob/main/transfer/formality_ranker_v1.ipynb) to predict for English sentences, whether they are formal or informal.
 Base model: `roberta-base`
 Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005).
+Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features.
 Loss: binary classification (on GYAFC), in-batch ranking (on PT data).
+Performance metrics on the validation data:
+| dataset | ROC AUC | accuracy | Spearman R|
+| - | - | - | - |
+| GYAFC | 0.9779 | 0.9087 |  0.8233 |
+| GYAFC normalized (lowercase + remove punct.) | 0.9234 | 0.8218|  0.7294 |
+| P&T subset | Spearman R |
+| -     | - |
+news    |	 0.4003
+answers |	 0.7500
+blog    |	 0.7334
+email   |	 0.7606