File size: 6,494 Bytes
ea31ad4
 
16ad9d0
 
 
 
 
 
14990b2
 
9d33a33
 
 
 
d770fea
9d33a33
 
 
 
 
 
 
 
 
 
3549489
 
 
 
 
 
 
 
 
d7d3fa4
 
3549489
2a8c4b6
3549489
 
 
 
 
 
 
d7d3fa4
 
 
 
9d33a33
 
 
 
d770fea
e5a72e3
 
9d33a33
 
 
 
 
 
eda4fbd
 
 
3549489
2a8c4b6
3549489
 
 
 
 
 
 
9d33a33
e5a72e3
14990b2
 
 
 
 
9d33a33
14990b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbd4009
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14990b2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: bigscience-bloom-rail-1.0
datasets:
- xnli
language:
- fr
- en
pipeline_tag: zero-shot-classification
---

# Presentation
We introduce the Bloomz-560m-NLI model, fine-tuned on the [Bloomz-560m-chat-dpo](https://huggingface.co/cmarkea/bloomz-560m-dpo-chat) foundation model.
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
between a hypothesis and a set of premises, often expressed as pairs of sentences.

The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of
three labels).
Sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of modelization is determined as follows:
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$

### Language-agnostic approach
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.

### Performance

| **class**          | **precision (%)** | **f1-score (%)** | **support** |
| :----------------: | :---------------: | :--------------: | :---------: |
| **global**         | 69.20             | 68.35            | 5,010       |
| **contradiction**  | 63.66             | 70.60            | 1,670       | 
| **entailment**     | 73.45             | 73.01            | 1,670       |
| **neutral**        | 70.75             | 61.45            | 1,670       |

### Benchmark

Here are the performances for the hypothesis and premise in French:

| **model**          | **accuracy (%)** | **MCC (x100)** |
| :--------------: | :--------------: | :------------: |
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45     | 66.24         |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72     | 72.67         |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15     |
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57     |
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66     |
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89     |

And now the hypothesis in French and the premise in English (cross-language context):



# Zero-shot Classification
The primary appeal of training such models lies in their zero-shot classification performance. This means the model is capable of classifying any text with any label
without specific training. What sets the Bloomz-560m-NLI LLMs apart in this realm is their ability to model and extract information from significantly more complex
and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.

The zero-shot classification task can be summarized by:
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
of hypotheses comprises {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
is the sentence we aim to classify.

### Performance

The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled 
into 2 classes, positive comments and negative comments. We then use the hypothesis template "Ce commentaire est {}. and the candidate classes "positif" and "negatif".

| **model**     | **accuracy (%)** | **MCC (x100)** |
| :--------------: | :--------------: | :------------: |
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 80.59         | 63.71         |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 86.37     | 73.74     |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 84.97         | 70.05         |
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 71.13 | 46.3     |
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 89.06 | 78.10     |
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 95.12 | 90.27     |

# How to use Bloomz-560m-NLI

```python
from transformers import pipeline

classifier = pipeline(
    task='zero-shot-classification',
    model="cmarkea/bloomz-560m-nli"
)
result = classifier (
    sequences="Le style très cinéphile de Quentin Tarantino "
    "se reconnaît entre autres par sa narration postmoderne "
    "et non linéaire, ses dialogues travaillés souvent "
    "émaillés de références à la culture populaire, et ses "
    "scènes hautement esthétiques mais d'une violence "
    "extrême, inspirées de films d'exploitation, d'arts "
    "martiaux ou de western spaghetti.",
    candidate_labels="cinéma, technologie, littérature, politique",
    hypothesis_template="Ce texte parle de {}."
)

result
{"labels": ["cinéma",
            "littérature",
            "technologie",
            "politique"],
 "scores": [0.6797838807106018,
            0.1440986692905426,
            0.09773541986942291,
            0.07838203758001328]}

# Resilience in cross-language French/English context
result = classifier (
    sequences="Quentin Tarantino's very cinephile style is "
    "recognized, among other things, by his postmodern and "
    "non-linear narration, his elaborate dialogues often "
    "peppered with references to popular culture, and his "
    "highly aesthetic but extremely violent scenes, inspired by "
    "exploitation films, martial arts or spaghetti western.",
    candidate_labels="cinéma, technologie, littérature, politique",
    hypothesis_template="Ce texte parle de {}."
)

result
{"labels": ["cinéma",
            "littérature",
            "technologie",
            "politique"],
 "scores": [0.6970456838607788,
            0.17720822989940643,
            0.06449680775403976,
            0.0612492673099041]}
```