Update README.md
Browse files
README.md
CHANGED
@@ -8,15 +8,37 @@ language:
|
|
8 |
pipeline_tag: zero-shot-classification
|
9 |
---
|
10 |
|
11 |
-
|
12 |
-
We introduce the Bloomz-3b-NLI model, fine-tuned on the [Bloomz-3b-dpo
|
|
|
|
|
13 |
|
14 |
-
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
The zero-shot classification task can be summarized by:
|
18 |
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
19 |
-
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and candidate labels ("cinema", "politics", etc.), the set
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
```python
|
22 |
from transformers import pipeline
|
|
|
8 |
pipeline_tag: zero-shot-classification
|
9 |
---
|
10 |
|
11 |
+
# Presentation
|
12 |
+
We introduce the Bloomz-3b-NLI model, fine-tuned on the [Bloomz-3b-chat-dpo](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) foundation model.
|
13 |
+
This model is trained on a Natural Language Inference (NLI) task in a language-agnostic manner. The NLI task involves determining the semantic relationship
|
14 |
+
between a hypothesis and a set of premises, often expressed as pairs of sentences.
|
15 |
|
16 |
+
The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of
|
17 |
+
three labels).
|
18 |
+
Sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of modelization is determined as follows:
|
19 |
+
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
|
20 |
+
|
21 |
+
### Language-agnostic approach
|
22 |
+
It should be noted that hypotheses and premises are randomly chosen between English and French, with each language combination representing a probability of 25%.
|
23 |
+
|
24 |
+
### Detaset
|
25 |
+
|
26 |
+
### Performance
|
27 |
+
|
28 |
+
# Zero-shot Classification
|
29 |
+
The primary appeal of training such models lies in their zero-shot classification performance. This means the model is capable of classifying any text with any label
|
30 |
+
without specific training. What sets the Bloomz-3b-NLI LLMs apart in this realm is their ability to model and extract information from significantly more complex
|
31 |
+
and lengthy test structures compared to models like BERT, RoBERTa, or CamemBERT.
|
32 |
|
33 |
The zero-shot classification task can be summarized by:
|
34 |
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
35 |
+
With *i* representing a hypothesis composed of a template (for example, "This text is about {}.") and *#C* candidate labels ("cinema", "politics", etc.), the set
|
36 |
+
of hypotheses comprises {"This text is about cinema.", "This text is about politics.", ...}. It is these hypotheses that we will measure against the premise, which
|
37 |
+
is the sentence we aim to classify.
|
38 |
+
|
39 |
+
### Performance
|
40 |
+
|
41 |
+
# How to use Bloomz-560m-NLI
|
42 |
|
43 |
```python
|
44 |
from transformers import pipeline
|