Cyrile commited on
Commit
588ca27
·
1 Parent(s): c135611

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -3
README.md CHANGED
@@ -30,19 +30,35 @@ The dataset is composed of 204,993 reviews for training and 4,999 reviews for th
30
  Evaluation results
31
  ------------------
32
 
 
 
 
 
 
 
 
 
33
  Benchmark
34
  ---------
35
 
36
  This model is compared to 3 reference models (see below). As each model doesn't have the same definition of targets, we detail the performance measure used for each of them. For the mean inference time measure, an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used.
37
 
38
  #### bert-base-multilingual-uncased-sentiment
39
- [nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) is based on BERT model in the multilingual and uncased version. This sentiment analyzer is trained on Amazon reviews similarly to our model, hence the targets and their definitions are the same. In order to be robust to +/-1 star estimation errors, we will take the following definition as a performance measure:
40
- $$acc=\frac{1}{|\mathcal{O}|}\sum_{i\in\mathcal{O}}\sum_{0\leq l < 5}p_{i,l}\hat{p}_{i,l}$$
41
- where $\mathcal{O}$ is the test set of the observations, $p_l\in\{0,1\}$ is equal to 1 for the true label and 0 otherwise and $\hat{p}_l$ is the estimated probability for the l-th label.
 
 
 
42
 
43
  #### tf-allociné and barthez-sentiment-classification
44
  [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) based on [CamemBERT](https://huggingface.co/camembert-base) model and [moussaKam/barthez-sentiment-classification](https://huggingface.co/moussaKam/barthez-sentiment-classification) based on [BARThez](https://huggingface.co/moussaKam/barthez) use the same bi-class definition between them. To bring this back to a two-class problem, we will only consider the *"1 star"* and *"2 stars"* labels for the *negative* sentiments and *"4 stars"* and *"5 stars"* for *positive* sentiments. We exclude the *"3 stars"* which can be interpreted as a *neutral* class. In this context, the problem of +/-1 star estimation errors disappears. Then we use the classical accuracy definition.
45
 
 
 
 
 
 
46
  How to use DistilCamemBERT-Sentiment
47
  ------------------------------------
48
 
 
30
  Evaluation results
31
  ------------------
32
 
33
+ In order to be robust to +/-1 star estimation errors, we will take the following definition as a performance measure:
34
+ $$\mathrm{top\!-\!2\; acc}=\frac{1}{|\mathcal{O}|}\sum_{i\in\mathcal{O}}\sum_{0\leq l < 2}\mathbb{1}(\hat{f}_{i,l}=y_i)$$
35
+ where $\hat{f}_l$ is the l-th largest predicted label, $y$ the true label, $\mathcal{O}$ is the test set of the observations and $\mathbb{1}$ is the indicator function.
36
+
37
+ | **exact accuracy (%)** | **top-2 acc (%)** |
38
+ | :--------------------: | :---------------: |
39
+ | 61.01 | 88.80 |
40
+
41
  Benchmark
42
  ---------
43
 
44
  This model is compared to 3 reference models (see below). As each model doesn't have the same definition of targets, we detail the performance measure used for each of them. For the mean inference time measure, an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used.
45
 
46
  #### bert-base-multilingual-uncased-sentiment
47
+ [nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) is based on BERT model in the multilingual and uncased version. This sentiment analyzer is trained on Amazon reviews similarly to our model, hence the targets and their definitions are the same.
48
+
49
+ | **model** | **time (ms)** | **exact accuracy (%)** | **top-2 acc (%)** |
50
+ | :-------: | :------: | :--------------------: | :---------------: |
51
+ | [cmarkea/distilcamembert-base-sentiment]() | 95.56 | 61.01 | 88.80 |
52
+ | [nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) | 187.70 | 54.41 | 82.82 |
53
 
54
  #### tf-allociné and barthez-sentiment-classification
55
  [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) based on [CamemBERT](https://huggingface.co/camembert-base) model and [moussaKam/barthez-sentiment-classification](https://huggingface.co/moussaKam/barthez-sentiment-classification) based on [BARThez](https://huggingface.co/moussaKam/barthez) use the same bi-class definition between them. To bring this back to a two-class problem, we will only consider the *"1 star"* and *"2 stars"* labels for the *negative* sentiments and *"4 stars"* and *"5 stars"* for *positive* sentiments. We exclude the *"3 stars"* which can be interpreted as a *neutral* class. In this context, the problem of +/-1 star estimation errors disappears. Then we use the classical accuracy definition.
56
 
57
+ | **model** | **time (ms)** | **exact accuracy (%)** |
58
+ | [cmarkea/distilcamembert-base-sentiment]() | 95.56 | 97.52 |
59
+ | [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) | 329.74 | 95.69 |
60
+ | [moussaKam/barthez-sentiment-classification](https://huggingface.co/moussaKam/barthez-sentiment-classification) | 197.95 | 94.29 |
61
+
62
  How to use DistilCamemBERT-Sentiment
63
  ------------------------------------
64