metadata

license: bigscience-bloom-rail-1.0
datasets:
  - xnli
language:
  - fr
  - en
pipeline_tag: zero-shot-classification

Presentation

We introduce the Bloomz-560m-NLI model, fine-tuned on the Bloomz-560m-chat-dpo foundation model. This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship between a hypothesis and a set of premises, often expressed as pairs of sentences.

The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of three labels). Sentence A is called premise, and sentence B is called hypothesis, then the goal of modelization is determined as follows: $P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$

Language-agnostic approach

It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.

Detaset

Performance

class	precision (%)	f1-score (%)	support
global	69.20	68.35	5,010
contradiction	63.66	70.60	1,670
entailment	73.45	73.01	1,670
neutral	70.75	61.45	1,670

Benchmark

model	accuracy (%)	MCC (x100)
cmarkea/distilcamembert-base-nli	77.45	66.24
BaptisteDoyen/camembert-base-xnli	81.72	72.67
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	83.43	75.15
cmarkea/bloomz-560m-nli	68.70	53.57
cmarkea/bloomz-3b-nli	81.08	71.66
cmarkea/bloomz-7b1-mt-nli	83.13	74.89

Zero-shot Classification

The primary appeal of training such models lies in their zero-shot classification performance. This means the model is capable of classifying any text with any label without specific training. What sets the Bloomz-560m-NLI LLMs apart in this realm is their ability to model and extract information from significantly more complex and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.

The zero-shot classification task can be summarized by: $P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$ With i representing a hypothesis composed of a template (for example, "This text is about {}.") and #C candidate labels ("cinema", "politics", etc.), the set of hypotheses comprises {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which is the sentence we aim to classify.

Performance

model	accuracy (%)	MCC (x100)
cmarkea/distilcamembert-base-nli	80.59	63.71
BaptisteDoyen/camembert-base-xnli	86.37	73.74
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	84.97	70.05
cmarkea/bloomz-560m-nli	71.13	46.3
cmarkea/bloomz-3b-nli	89.06	78.10
cmarkea/bloomz-7b1-mt-nli	95.12	90.27

How to use Bloomz-560m-NLI

from transformers import pipeline

classifier = pipeline(
    task='zero-shot-classification',
    model="cmarkea/bloomz-560m-nli"
)
result = classifier (
    sequences="Le style très cinéphile de Quentin Tarantino "
    "se reconnaît entre autres par sa narration postmoderne "
    "et non linéaire, ses dialogues travaillés souvent "
    "émaillés de références à la culture populaire, et ses "
    "scènes hautement esthétiques mais d'une violence "
    "extrême, inspirées de films d'exploitation, d'arts "
    "martiaux ou de western spaghetti.",
    candidate_labels="cinéma, technologie, littérature, politique",
    hypothesis_template="Ce texte parle de {}."
)

result
{"labels": ["cinéma",
            "littérature",
            "technologie",
            "politique"],
 "scores": [0.6797838807106018,
            0.1440986692905426,
            0.09773541986942291,
            0.07838203758001328]}

# Resilience in cross-language French/English context
result = classifier (
    sequences="Quentin Tarantino's very cinephile style is "
    "recognized, among other things, by his postmodern and "
    "non-linear narration, his elaborate dialogues often "
    "peppered with references to popular culture, and his "
    "highly aesthetic but extremely violent scenes, inspired by "
    "exploitation films, martial arts or spaghetti western.",
    candidate_labels="cinéma, technologie, littérature, politique",
    hypothesis_template="Ce texte parle de {}."
)

result
{"labels": ["cinéma",
            "littérature",
            "technologie",
            "politique"],
 "scores": [0.6970456838607788,
            0.17720822989940643,
            0.06449680775403976,
            0.0612492673099041]}