s-nlp
/

roberta-base-formality-ranker

Text Classification

Inference Endpoints

Model card Files Files and versions Community

roberta-base-formality-ranker / README.md

Skolkovo Institute of Science and Technology

Update README.md

4f277ae almost 2 years ago

|

raw history blame

No virus

1.46 kB

	---
	language:
	- en
	tags:
	- formality
	datasets:
	- GYAFC
	- Pavlick-Tetreault-2016
	---

	The model has been trained to predict for English sentences, whether they are formal or informal.

	Base model: `roberta-base`

	Datasets: [GYAFC](https://github.com/raosudha89/GYAFC-corpus) from [Rao and Tetreault, 2018](https://aclanthology.org/N18-1012) and [online formality corpus](http://www.seas.upenn.edu/~nlp/resources/formality-corpus.tgz) from [Pavlick and Tetreault, 2016](https://aclanthology.org/Q16-1005).

	Data augmentation: changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. It was applied because otherwise the model is over-reliant on punctuation and capitalization and does not pay enough attention to other features.

	Loss: binary classification (on GYAFC), in-batch ranking (on PT data).

	Performance metrics on the test data:

	\| dataset \| ROC AUC \| precision \| recall \| fscore \| accuracy \| Spearman \|
	\|----------------------------------------------\|---------\|-----------\|--------\|--------\|----------\|------------\|
	\| GYAFC \| 0.9779 \| 0.90 \| 0.91 \| 0.90 \| 0.9087 \| 0.8233 \|
	\| GYAFC normalized (lowercase + remove punct.) \| 0.9234 \| 0.85 \| 0.81 \| 0.82 \| 0.8218 \| 0.7294 \|

	\| P&T subset \| Spearman R \|
	\| - \| - \|
	news \| 0.4003
	answers \| 0.7500
	blog \| 0.7334
	email \| 0.7606