README.md · cointegrated/rubert-base-cased-dp-paraphrase-detection at b32c7b54a0d9da2d25aa1ca9c1ae738d7f1d704e

This is a version of paraphrase detector by DeepPavlov (details in the documentation) ported to the Transformers format. All credit goes to the authors of DeepPavlov.

The model has been trained on the dataset from http://paraphraser.ru/.

It classifies texts as paraphrases (class 1) or non-paraphrases (class 0).

import torch
from transformers import AutoModelForSequenceClassification, BertTokenizer
model_name = 'cointegrated/rubert-base-cased-dp-paraphrase-detection'
model = AutoModelForSequenceClassification.from_pretrained(model_name).cuda()
tokenizer = BertTokenizer.from_pretrained(model_name)
text1 = 'Сегодня на улице хорошая погода'
text2 = 'Сегодня на улице отвратительная погода'
batch = tokenizer(text1, text2, return_tensors='pt').to(model.device)
with torch.inference_mode():
    proba = torch.softmax(model(**batch).logits, -1).cpu().numpy()
print(proba)
# [[0.44876656 0.5512334 ]]