File size: 2,673 Bytes
14b497e
 
a555572
 
 
 
 
 
 
14b497e
a555572
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1be63fa
a555572
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: mit
datasets:
- xnli
language:
- en
metrics:
- accuracy
pipeline_tag: zero-shot-classification
---

# XLM-ROBERTA-BASE-XNLI-EN

## Model description 
This model takes the XLM-Roberta-base model which has been continued to pre-traine on a large corpus of Twitter in multiple languages.   
It was developed following a similar strategy as introduced as part of the [Tweet Eval](https://github.com/cardiffnlp/tweeteval) framework.   
The model is further finetuned on the english part of the XNLI training  dataset. 

## Intended Usage

This model was developed to do Zero-Shot Text Classification in the realm of Hate Speech Detection. It is focused on the language of english as it was finetuned on data in said language. Since the base model was pre-trained on 100 different languages it has shown some effectiveness in other languages. Please refer to the list of languages in the [XLM Roberta paper](https://arxiv.org/abs/1911.02116)

### Usage with Zero-Shot Classification pipeline
```python
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="morit/english_xlm_xnli")
```

After loading the model you can classify sequences in the languages mentioned above. You can specify your sequences and a matching hypothesis to be able to classify your proposed candidate labels.

```python
sequence_to_classify = "I think Rishi Sunak is going to win the elections"


# we can specify candidate labels and hypothesis:
candidate_labels = ["politics", "football"]
hypothesis_template = "This example is {}"

# classify using the information provided
classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)


# Output
#{'sequence': 'I think Rishi Sunak is going to win the elections',
# 'labels': ['politics', 'football'],
# 'scores': [0.7982912659645081, 0.20170868933200836]}
```


## Training 
This model was pre-trained on a set of 100 languages and follwed further training on 198M multilingual tweets as described in the original [paper](https://arxiv.org/abs/2104.12250). Further it was trained on the training set of XNLI dataset in english which actually is the original version of the MNLI dataset. It was trained on 3 epochs and the following specifications
- learning rate:    5e-5
- batch size:     32
- max sequence:   length 128

on one GPU (NVIDIA GeForce RTX 3090) resulting in a training time of 1h 47 mins.
	

## Evaluation 
The model was evaluated after each epoch on the eval set of the XNLI Corpus and at the end of training on the Test set of the XNLI corpus. 
Using the test set the model reached an accuracy of 
```
predict_accuracy = 83.17 %
```