Update README.md
Browse filesadd optimum + onnx
README.md
CHANGED
@@ 17,14 +17,14 @@ datasets:


17 
DistilCamemBERTNLI

18 
===================

19 

20 

We present DistilCamemBERTNLI which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembertbase) finetuned for the Natural Language Inference (NLI) task for the french language, also known as recognizing textual entailment (RTE). This model is constructed on the XNLI dataset which

21 

22 

This modelization is close to [BaptisteDoyen/camembertbasexnli](https://huggingface.co/BaptisteDoyen/camembertbasexnli) based on [CamemBERT](https://huggingface.co/camembertbase) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue especially

23 

24 
Dataset

25 


26 

27 

The dataset XNLI from [FLUE](https://huggingface.co/datasets/flue)

28 
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$

29 

30 
Evaluation results

@@ 40,7 +40,7 @@ Evaluation results


40 
Benchmark

41 


42 

43 

We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembertbase) model to 2 other modelizations working on french language. The first one [BaptisteDoyen/camembertbasexnli](https://huggingface.co/BaptisteDoyen/camembertbasexnli) is based on well named [CamemBERT](https://huggingface.co/camembertbase), the french RoBERTa model and the second one [MoritzLaurer/mDeBERTav3basemnlixnli](https://huggingface.co/MoritzLaurer/mDeBERTav3basemnlixnli) based on [mDeBERTav3](https://huggingface.co/microsoft/mdebertav3base) a multilingual model. To compare the performances the metrics of accuracy and [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient)

44 

45 
 **model**  **time (ms)**  **accuracy (%)**  **MCC (x100)** 

46 
 ::  ::  ::  :: 

@@ 54,7 +54,7 @@ Zeroshot classification


54 
The main advantage of such modelization is to create a zeroshot classifier allowing text classification without training. This task can be summarized by:

55 
$$P(hypothesis=i\in\mathcal{C}premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$

56 

57 

For this part, we use

58 

59 
 **model**  **time (ms)**  **accuracy (%)**  **MCC (x100)** 

60 
 ::  ::  ::  :: 

@@ 62,7 +62,7 @@ For this part, we use 2 datasets, the first one: [allocine](https://huggingface.


62 
 [BaptisteDoyen/camembertbasexnli](https://huggingface.co/BaptisteDoyen/camembertbasexnli)  378.39  **86.37**  **73.74** 

63 
 [MoritzLaurer/mDeBERTav3basemnlixnli](https://huggingface.co/MoritzLaurer/mDeBERTav3basemnlixnli)  520.58  84.97  70.05 

64 

65 

The second one: [mlsum](https://huggingface.co/datasets/mlsum) used to train the summarization models.

66 

67 
 **model**  **time (ms)**  **accuracy (%)**  **MCC (x100)** 

68 
 ::  ::  ::  :: 

@@ 103,6 +103,24 @@ result


103 
0.0455702543258667]}

104 
```

105 





































106 
Citation

107 


108 
```bibtex



17 
DistilCamemBERTNLI

18 
===================

19 

20 
+
We present DistilCamemBERTNLI, which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembertbase) finetuned for the Natural Language Inference (NLI) task for the french language, also known as recognizing textual entailment (RTE). This model is constructed on the XNLI dataset, which determines whether a premise entails, contradicts or neither entails or contradicts a hypothesis.

21 

22 
+
This modelization is close to [BaptisteDoyen/camembertbasexnli](https://huggingface.co/BaptisteDoyen/camembertbasexnli) based on [CamemBERT](https://huggingface.co/camembertbase) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase, for example. Indeed, inference cost can be a technological issue especially in the context of crossencoding like this task. To counteract this effect, we propose this modelization which divides the inference time by 2 with the same consumption power, thanks to DistilCamemBERT.

23 

24 
Dataset

25 


26 

27 
+
The dataset XNLI from [FLUE](https://huggingface.co/datasets/flue) comprises 392,702 premises with their hypothesis for the train and 5,010 couples for the test. The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of three labels). Sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of modelization is determined as follows:

28 
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$

29 

30 
Evaluation results



40 
Benchmark

41 


42 

43 
+
We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembertbase) model to 2 other modelizations working on the french language. The first one [BaptisteDoyen/camembertbasexnli](https://huggingface.co/BaptisteDoyen/camembertbasexnli) is based on well named [CamemBERT](https://huggingface.co/camembertbase), the french RoBERTa model and the second one [MoritzLaurer/mDeBERTav3basemnlixnli](https://huggingface.co/MoritzLaurer/mDeBERTav3basemnlixnli) based on [mDeBERTav3](https://huggingface.co/microsoft/mdebertav3base) a multilingual model. To compare the performances, the metrics of accuracy and [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient) were used. We used an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** for mean inference time measure.

44 

45 
 **model**  **time (ms)**  **accuracy (%)**  **MCC (x100)** 

46 
 ::  ::  ::  :: 



54 
The main advantage of such modelization is to create a zeroshot classifier allowing text classification without training. This task can be summarized by:

55 
$$P(hypothesis=i\in\mathcal{C}premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$

56 

57 
+
For this part, we use two datasets, the first one: [allocine](https://huggingface.co/datasets/allocine) used to train the sentiment analysis models. The dataset comprises two classes: "positif" and "négatif" appreciation of movie reviews. Here we use "Ce commentaire est {}." as the hypothesis template and "positif" and "négatif" as candidate labels.

58 

59 
 **model**  **time (ms)**  **accuracy (%)**  **MCC (x100)** 

60 
 ::  ::  ::  :: 



62 
 [BaptisteDoyen/camembertbasexnli](https://huggingface.co/BaptisteDoyen/camembertbasexnli)  378.39  **86.37**  **73.74** 

63 
 [MoritzLaurer/mDeBERTav3basemnlixnli](https://huggingface.co/MoritzLaurer/mDeBERTav3basemnlixnli)  520.58  84.97  70.05 

64 

65 
+
The second one: [mlsum](https://huggingface.co/datasets/mlsum) used to train the summarization models. In this aim, we aggregate subtopics and select a few of them. We use the articles summary part to predict their topics. In this case, the hypothesis template used is "C'est un article traitant de {}." and the candidate labels are: "économie", "politique", "sport" and "science".

66 

67 
 **model**  **time (ms)**  **accuracy (%)**  **MCC (x100)** 

68 
 ::  ::  ::  :: 



103 
0.0455702543258667]}

104 
```

105 

106 
+
### Optimum + ONNX

107 
+

108 
+
```python

109 
+
from optimum.onnxruntime import ORTModelForSequenceClassification

110 
+
from transformers import AutoTokenizer, pipeline

111 
+

112 
+
HUB_MODEL = "cmarkea/distilcamembertbasenli"

113 
+

114 
+
tokenizer = AutoTokenizer.from_pretrained(HUB_MODEL)

115 
+
model = ORTModelForSequenceClassification.from_pretrained(HUB_MODEL)

116 
+
onnx_qa = pipeline("zeroshotclassification", model=model, tokenizer=tokenizer)

117 
+

118 
+
# Quantized onnx model

119 
+
quantized_model = ORTModelForSequenceClassification.from_pretrained(

120 
+
HUB_MODEL, file_name="model_quantized.onnx"

121 
+
)

122 
+
```

123 
+

124 
Citation

125 


126 
```bibtex
