File size: 7,543 Bytes
ea31ad4 16ad9d0 14990b2 9d33a33 d18139e 9d33a33 d770fea d18139e 9d33a33 d18139e 9d33a33 3549489 d18139e d7d3fa4 3549489 2a8c4b6 3549489 d7d3fa4 6f708f1 61c2d43 6f708f1 d7d3fa4 9d33a33 d18139e 444be69 d770fea e5a72e3 9d33a33 d18139e 9d33a33 eda4fbd c180b74 8d15914 eda4fbd 3549489 2a8c4b6 3549489 9d33a33 e5a72e3 14990b2 9d33a33 14990b2 cbd4009 965ffe7 14990b2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
---
license: bigscience-bloom-rail-1.0
datasets:
- xnli
language:
- fr
- en
pipeline_tag: zero-shot-classification
---
# Presentation
We introduce the Bloomz-560m-NLI model, fine-tuned from the [Bloomz-560m-chat-dpo](https://huggingface.co/cmarkea/bloomz-560m-dpo-chat) foundation model.
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
between a hypothesis and a set of premises, often expressed as pairs of sentences.
The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of the
three labels).
If sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of the modelization is to estimate the following:
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
### Language-agnostic approach
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
### Performance
| **class** | **precision (%)** | **f1-score (%)** | **support** |
| :----------------: | :---------------: | :--------------: | :---------: |
| **global** | 69.20 | 68.35 | 5,010 |
| **contradiction** | 63.66 | 70.60 | 1,670 |
| **entailment** | 73.45 | 73.01 | 1,670 |
| **neutral** | 70.75 | 61.45 | 1,670 |
### Benchmark
Here are the performances for both the hypothesis and premise in French:
| **model** | **accuracy (%)** | **MCC (x100)** |
| :--------------: | :--------------: | :------------: |
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45 | 66.24 |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72 | 72.67 |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15 |
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57 |
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66 |
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89 |
And now the hypothesis in French and the premise in English (cross-language context):
| **model** | **accuracy (%)** | **MCC (x100)** |
| :--------------: | :--------------: | :------------: |
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 16.89 | -26.82 |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 74.59 | 61.97 |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 85.15 | 77.74 |
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.84 | 53.55 |
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 82.12 | 73.22 |
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 85.43 | 78.25 |
# Zero-shot Classification
The primary interest of training such models lies in their zero-shot classification performance. This means that the model is able to classify any text with any label
without a specific training. What sets the Bloomz-560m-NLI LLMs apart in this domain is their ability to model and extract information from significantly more complex
and lengthy text structures compared to models like BERT, RoBERTa, or CamemBERT.
The zero-shot classification task can be summarized by:
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
of hypotheses is composed of {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
is the sentence we aim to classify.
### Performance
The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled
into 2 classes, positive comments and negative comments in 20,000 reviews. We then use the hypothesis template "Ce commentaire est {}." and the candidate classes
"positif" and "negatif".
| **model** | **accuracy (%)** | **MCC (x100)** |
| :--------------: | :--------------: | :------------: |
| [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 80.59 | 63.71 |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 86.37 | 73.74 |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 84.97 | 70.05 |
| [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 71.13 | 46.3 |
| [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 89.06 | 78.10 |
| [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 95.12 | 90.27 |
# How to use Bloomz-560m-NLI
```python
from transformers import pipeline
classifier = pipeline(
task='zero-shot-classification',
model="cmarkea/bloomz-560m-nli"
)
result = classifier (
sequences="Le style très cinéphile de Quentin Tarantino "
"se reconnaît entre autres par sa narration postmoderne "
"et non linéaire, ses dialogues travaillés souvent "
"émaillés de références à la culture populaire, et ses "
"scènes hautement esthétiques mais d'une violence "
"extrême, inspirées de films d'exploitation, d'arts "
"martiaux ou de western spaghetti.",
candidate_labels="cinéma, technologie, littérature, politique",
hypothesis_template="Ce texte parle de {}."
)
result
{"labels": ["cinéma",
"littérature",
"technologie",
"politique"],
"scores": [0.6797838807106018,
0.1440986692905426,
0.09773541986942291,
0.07838203758001328]}
# Resilience in cross-language French/English context
result = classifier (
sequences="Quentin Tarantino's very cinephile style is "
"recognized, among other things, by his postmodern and "
"non-linear narration, his elaborate dialogues often "
"peppered with references to popular culture, and his "
"highly aesthetic but extremely violent scenes, inspired by "
"exploitation films, martial arts or spaghetti western.",
candidate_labels="cinéma, technologie, littérature, politique",
hypothesis_template="Ce texte parle de {}."
)
result
{"labels": ["cinéma",
"littérature",
"technologie",
"politique"],
"scores": [0.6970456838607788,
0.17720822989940643,
0.06449680775403976,
0.0612492673099041]}
```
Citation
--------
```bibtex
@online{DeBloomzNLI,
AUTHOR = {Cyrile Delestre},
URL = {https://huggingface.co/cmarkea/bloomz-560m-nli},
YEAR = {2024},
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
}
``` |