Cyrile commited on
Commit
89e69f8
1 Parent(s): ae48891

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md CHANGED
@@ -1,3 +1,134 @@
1
  ---
2
  license: bigscience-bloom-rail-1.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bigscience-bloom-rail-1.0
3
+ datasets:
4
+ - xnli
5
+ language:
6
+ - fr
7
+ - en
8
+ pipeline_tag: zero-shot-classification
9
  ---
10
+
11
+ # Presentation
12
+ We introduce the Bloomz-7b1-mt-NLI model, fine-tuned from the [Bloomz-7b1-mt-chat-dpo](https://huggingface.co/cmarkea/bloomz-7b1-mt-dpo-chat) foundation model.
13
+ This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
14
+ between a hypothesis and a set of premises, often expressed as pairs of sentences.
15
+
16
+ The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of the
17
+ three labels).
18
+ If sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of the modelization is to estimate the following:
19
+ $$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
20
+
21
+ ### Language-agnostic approach
22
+ It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
23
+
24
+ ### Performance
25
+
26
+ | **class** | **precision (%)** | **f1-score (%)** | **support** |
27
+ | :----------------: | :---------------: | :--------------: | :---------: |
28
+ | **global** | 83.31 | 83.02 | 5,010 |
29
+ | **contradiction** | 81.27 | 86.63 | 1,670 |
30
+ | **entailment** | 87.54 | 83.57 | 1,670 |
31
+ | **neutral** | 81.13 | 78.86 | 1,670 |
32
+
33
+ ### Benchmark
34
+
35
+ Here are the performances for both the hypothesis and premise in French:
36
+
37
+ | **model** | **accuracy (%)** | **MCC (x100)** |
38
+ | :--------------: | :--------------: | :------------: |
39
+ | [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 77.45 | 66.24 |
40
+ | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 81.72 | 72.67 |
41
+ | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 83.43 | 75.15 |
42
+ | [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.70 | 53.57 |
43
+ | [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 81.08 | 71.66 |
44
+ | [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 83.13 | 74.89 |
45
+
46
+ And now the hypothesis in French and the premise in English (cross-language context):
47
+
48
+ | **model** | **accuracy (%)** | **MCC (x100)** |
49
+ | :--------------: | :--------------: | :------------: |
50
+ | [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 16.89 | -26.82 |
51
+ | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 74.59 | 61.97 |
52
+ | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 85.15 | 77.74 |
53
+ | [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 68.84 | 53.55 |
54
+ | [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 82.12 | 73.22 |
55
+ | [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 85.43 | 78.25 |
56
+
57
+ # Zero-shot Classification
58
+ The primary interest of training such models lies in their zero-shot classification performance. This means that the model is able to classify any text with any label
59
+ without a specific training. What sets the Bloomz-3b-NLI LLMs apart in this domain is their ability to model and extract information from significantly more complex
60
+ and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
61
+
62
+ The zero-shot classification task can be summarized by:
63
+ $$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
64
+ With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
65
+ of hypotheses is composed of {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
66
+ is the sentence we aim to classify.
67
+
68
+ ### Performance
69
+
70
+ The model is evaluated based on sentiment analysis evaluation on the French film review site [Allociné](https://huggingface.co/datasets/allocine). The dataset is labeled
71
+ into 2 classes, positive comments and negative comments. We then use the hypothesis template "Ce commentaire est {}. and the candidate classes "positif" and "negatif".
72
+
73
+ | **model** | **accuracy (%)** | **MCC (x100)** |
74
+ | :--------------: | :--------------: | :------------: |
75
+ | [cmarkea/distilcamembert-base-nli](https://huggingface.co/cmarkea/distilcamembert-base-nli) | 80.59 | 63.71 |
76
+ | [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 86.37 | 73.74 |
77
+ | [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 84.97 | 70.05 |
78
+ | [cmarkea/bloomz-560m-nli](https://huggingface.co/cmarkea/bloomz-560m-nli) | 71.13 | 46.3 |
79
+ | [cmarkea/bloomz-3b-nli](https://huggingface.co/cmarkea/bloomz-3b-nli) | 89.06 | 78.10 |
80
+ | [cmarkea/bloomz-7b1-mt-nli](https://huggingface.co/cmarkea/bloomz-7b1-mt-nli) | 95.12 | 90.27 |
81
+
82
+ # How to use Bloomz-7b1-mt-NLI
83
+
84
+ ```python
85
+ from transformers import pipeline
86
+
87
+ classifier = pipeline(
88
+ task='zero-shot-classification',
89
+ model="cmarkea/bloomz-7b1-mt-nli"
90
+ )
91
+ result = classifier (
92
+ sequences="Le style très cinéphile de Quentin Tarantino "
93
+ "se reconnaît entre autres par sa narration postmoderne "
94
+ "et non linéaire, ses dialogues travaillés souvent "
95
+ "émaillés de références à la culture populaire, et ses "
96
+ "scènes hautement esthétiques mais d'une violence "
97
+ "extrême, inspirées de films d'exploitation, d'arts "
98
+ "martiaux ou de western spaghetti.",
99
+ candidate_labels="cinéma, technologie, littérature, politique",
100
+ hypothesis_template="Ce texte parle de {}."
101
+ )
102
+
103
+ result
104
+ {"labels": ["cinéma",
105
+ "littérature",
106
+ "technologie",
107
+ "politique"],
108
+ "scores": [0.8745610117912292,
109
+ 0.10403601825237274,
110
+ 0.014962797053158283,
111
+ 0.0064402492716908455]}
112
+
113
+ # Resilience in cross-language French/English context
114
+ result = classifier (
115
+ sequences="Quentin Tarantino's very cinephile style is "
116
+ "recognized, among other things, by his postmodern and "
117
+ "non-linear narration, his elaborate dialogues often "
118
+ "peppered with references to popular culture, and his "
119
+ "highly aesthetic but extremely violent scenes, inspired by "
120
+ "exploitation films, martial arts or spaghetti western.",
121
+ candidate_labels="cinéma, technologie, littérature, politique",
122
+ hypothesis_template="Ce texte parle de {}."
123
+ )
124
+
125
+ result
126
+ {"labels": ["cinéma",
127
+ "littérature",
128
+ "technologie",
129
+ "politique"],
130
+ "scores": [0.9314399361610413,
131
+ 0.04960821941494942,
132
+ 0.013468802906572819,
133
+ 0.005483036395162344]}
134
+ ```