s-nlp
/

rubert-base-cased-conversational-paraphrase-v1

Text Classification

sentence-similarity

paraphrase-detection

Inference Endpoints

Model card Files Files and versions Community

cointegrated commited on Jul 15, 2022

Commit

e3cbd20

•

1 Parent(s): 1795234

Create README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+---
+language:
+- ru
+tags:
+- sentence-similarity
+- text-classification
+- paraphrase-detection
+datasets:
+- merionum/ru_paraphraser
+- ivkrotova/rupaws_dataset
+- "a private dataset of manual evaluation of text detoxification"
+---
+This is a [ruBERT-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model trained on the mixture of 3 paraphrase detection datasets:
+- [ru_paraphraser](https://huggingface.co/merionum/ru_paraphraser)
+- [RuPAWS](https://github.com/ivkrotova/rupaws_dataset)
+- A dataset containing crowdsourced evaluation of content preservation in Russian text detoxification by [Dementieva et al, 2022](https://www.dialog-21.ru/media/5755/dementievadplusetal105.pdf).
+Training notebook: `task_oriented_TST/similarity/cross_encoders/russian/train_russian_paraphrase_detector__fixed.ipynb` (in a private repo).
+Training parameters:
+* optimizer: Adam
+* `lr=1e-5`
+* `batch_size=32`
+* `epochs=3`
+ROC AUC on the development data:
+```
+source         score
+detox          0.821665
+paraphraser    0.848287
+rupaws_qqp     0.761481
+rupaws_wiki    0.844093
+```