File size: 6,065 Bytes

---
language: ru
pipeline_tag: zero-shot-classification
tags:
- rubert
- russian
- nli
- rte
- zero-shot-classification
widget:
- text: "Я хочу поехать в Австралию"
  candidate_labels: "спорт,путешествия,музыка,кино,книги,наука,политика"
  hypothesis_template: "Тема текста - {}." 
---
# RuBERT for NLI (natural language inference)

This is the [DeepPavlov/rubert-base-cased](https://huggingface.co/DeepPavlov/rubert-base-cased) fine-tuned to predict the logical relationship between two short texts: entailment, contradiction, or neutral.

## Usage
How to run the model for NLI:
```python
# !pip install transformers sentencepiece --quiet
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_checkpoint = 'cointegrated/rubert-base-cased-nli-threeway'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)
if torch.cuda.is_available():
    model.cuda()

text1 = 'Сократ - человек, а все люди смертны.'
text2 = 'Сократ никогда не умрёт.'
with torch.inference_mode():
    out = model(**tokenizer(text1, text2, return_tensors='pt').to(model.device))
    proba = torch.softmax(out.logits, -1).cpu().numpy()[0]
print({v: proba[k] for k, v in model.config.id2label.items()})
# {'entailment': 0.009525929, 'contradiction': 0.9332064, 'neutral': 0.05726764} 
```

You can also use this model for zero-shot short text classification (by labels only), e.g. for sentiment analysis:

```python
def predict_zero_shot(text, label_texts, model, tokenizer, label='entailment', normalize=True):
    label_texts
    tokens = tokenizer([text] * len(label_texts), label_texts, truncation=True, return_tensors='pt', padding=True)
    with torch.inference_mode():
        result = torch.softmax(model(**tokens.to(model.device)).logits, -1)
    proba = result[:, model.config.label2id[label]].cpu().numpy()
    if normalize:
        proba /= sum(proba)
    return proba

classes = ['Я доволен', 'Я недоволен']
predict_zero_shot('Какая гадость эта ваша заливная рыба!', classes, model, tokenizer)
# array([0.05609814, 0.9439019 ], dtype=float32)
predict_zero_shot('Какая вкусная эта ваша заливная рыба!', classes, model, tokenizer)
# array([0.9059292 , 0.09407079], dtype=float32)
```

Alternatively, you can use [Huggingface pipelines](https://huggingface.co/transformers/main_classes/pipelines.html) for inference.

## Sources
The model has been trained on a series of NLI datasets automatically translated to Russian from English.

Most datasets were taken [from the repo of Felipe Salvatore](https://github.com/felipessalvatore/NLI_datasets):
[JOCI](https://github.com/sheng-z/JOCI), 
[MNLI](https://cims.nyu.edu/~sbowman/multinli/), 
[MPE](https://aclanthology.org/I17-1011/), 
[SICK](http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf), 
[SNLI](https://nlp.stanford.edu/projects/snli/).

Some datasets obtained from the original sources:
[ANLI](https://github.com/facebookresearch/anli), 
[NLI-style FEVER](https://github.com/easonnie/combine-FEVER-NSMN/blob/master/other_resources/nli_fever.md),
[IMPPRES](https://github.com/facebookresearch/Imppres).

## Performance

The table below shows ROC AUC for three models on small samples of the DEV sets:
- [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
- [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
- [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral

|model      |tiny/entailment|twoway/entailment|threeway/entailment|threeway/contradiction|threeway/neutral|
|-----------|---------------|-----------------|-------------------|-------------------------|-------------------|
|add_one_rte|0.82           |0.90             |0.92               |                         |                   |
|anli_r1    |0.50           |0.68             |0.66               |0.70                     |0.75               |
|anli_r2    |0.55           |0.62             |0.62               |0.62                     |0.69               |
|anli_r3    |0.50           |0.63             |0.59               |0.62                     |0.64               |
|copa       |0.55           |0.60             |0.62               |                         |                   |
|fever      |0.88           |0.94             |0.94               |0.91                     |0.92               |
|help       |0.74           |0.87             |0.46               |                         |                   |
|iie        |0.79           |0.85             |0.54               |                         |                   |
|imppres    |0.94           |0.99             |0.99               |0.99                     |0.99               |
|joci       |0.87           |0.93             |0.93               |0.85                     |0.80               |
|mnli       |0.87           |0.92             |0.93               |0.89                     |0.86               |
|monli      |0.94           |1.00             |0.67               |                         |                   |
|mpe        |0.82           |0.90             |0.90               |0.91                     |0.80               |
|scitail    |0.80           |0.96             |0.85               |                         |                   |
|sick       |0.97           |0.99             |0.99               |0.98                     |0.96               |
|snli       |0.95           |0.98             |0.98               |0.99                     |0.97               |
|terra      |0.73           |0.93             |0.93               |                         |                   |