cointegrated
commited on
Commit
•
27965ca
1
Parent(s):
b6a2eba
Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,4 @@ Corruption sources: random replacement, deletion, addition, shuffling, and re-in
|
|
11 |
|
12 |
Data sources: web-corpora from [the Leipzig collection](https://wortschatz.uni-leipzig.de/en/download) (`rus_news_2020_100K`, `rus_newscrawl-public_2018_100K`, `rus-ru_web-public_2019_100K`, `rus_wikipedia_2021_100K`), comments from [OK](https://www.kaggle.com/alexandersemiletov/toxic-russian-comments) and [Pikabu](https://www.kaggle.com/blackmoon/russian-language-toxic-comments).
|
13 |
|
14 |
-
On our private test dataset, the model has achieved 40% rank correlation with human
|
|
|
11 |
|
12 |
Data sources: web-corpora from [the Leipzig collection](https://wortschatz.uni-leipzig.de/en/download) (`rus_news_2020_100K`, `rus_newscrawl-public_2018_100K`, `rus-ru_web-public_2019_100K`, `rus_wikipedia_2021_100K`), comments from [OK](https://www.kaggle.com/alexandersemiletov/toxic-russian-comments) and [Pikabu](https://www.kaggle.com/blackmoon/russian-language-toxic-comments).
|
13 |
|
14 |
+
On our private test dataset, the model has achieved 40% rank correlation with human judgements of naturalness, which is higher than GPT perplexity, another popular fluency metric.
|