Cyrile commited on
Commit
67cca49
1 Parent(s): 9bede78

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -20,8 +20,8 @@ This modelization is close to [tblard/tf-allocine](https://huggingface.co/tblard
20
  Dataset
21
  -------
22
 
23
- The dataset is composed of XXX,XXX reviews for training and X,XXX review for the test issue of Amazon, and respectively XXX,XXX and X,XXX critics issue of Allocine website. The dataset is labeled into 5 categories:
24
- * 1 star: represent very bad appreciation,
25
  * 2 stars: bad appreciation,
26
  * 3 stars: neutral appreciation,
27
  * 4 stars: good appreciation,
@@ -36,12 +36,12 @@ Benchmark
36
  This model is compared to 3 reference models (see below). As each model doesn't have the same definition of targets, we detail the performance measure used for each of them. For the mean inference time measure, an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used.
37
 
38
  #### bert-base-multilingual-uncased-sentiment
39
- [nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) is based on BERT model in multilingual and uncased version. This sentiment analyzer is trained on Amazon review like our model, then the targets and their definition are the same. In order to be robust to +/-1 star estimation errors, we will take the following definition as a performance measure:
40
  $$acc=\frac{1}{|\mathcal{O}|}\sum_{i\in\mathcal{O}}\sum_{0\leq l < 5}p_{i,l}\hat{p}_{i,l},$$
41
- where $\mathcal{O}$ is the test set of the observations, $p_l\in\{0,1\}$ is equal at 1 for the true label and $\hat{p}_l$ the estimated probability for the l-th label.
42
 
43
  #### tf-allociné and barthez-sentiment-classification
44
- [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) and [moussaKam/barthez-sentiment-classification](https://huggingface.co/moussaKam/barthez-sentient-classification) use the same bi-class definition between them. To bring this back to a two-class problem, we will consider only the "1 star" and "2 stars" labels for the "negative" sentiments and "4 stars" and "5 stars" for "positive" sentiments. We exclude the "3 stars" can witch interpreted as "neutral" class. In this context, the problem of +/-1 star estimation errors disappears. Then we use the classical accuracy definition.
45
 
46
  How to use DistilCamemBERT-Sentiment
47
  ------------------------------------
 
20
  Dataset
21
  -------
22
 
23
+ The dataset is composed of XXX,XXX reviews for training and X,XXX reviews for the test coming from Amazon, and respectively XXX,XXX and X,XXX critics from Allocine website. The dataset is labeled into 5 categories:
24
+ * 1 star: represents a very bad appreciation,
25
  * 2 stars: bad appreciation,
26
  * 3 stars: neutral appreciation,
27
  * 4 stars: good appreciation,
 
36
  This model is compared to 3 reference models (see below). As each model doesn't have the same definition of targets, we detail the performance measure used for each of them. For the mean inference time measure, an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used.
37
 
38
  #### bert-base-multilingual-uncased-sentiment
39
+ [nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) is based on BERT model in the multilingual and uncased version. This sentiment analyzer is trained on Amazon reviews similarly to our model, hence the targets and their definitions are the same. In order to be robust to +/-1 star estimation errors, we will take the following definition as a performance measure:
40
  $$acc=\frac{1}{|\mathcal{O}|}\sum_{i\in\mathcal{O}}\sum_{0\leq l < 5}p_{i,l}\hat{p}_{i,l},$$
41
+ where $\mathcal{O}$ is the test set of the observations, $p_l\in\{0,1\}$ is equal to 1 for the true label and $\hat{p}_l$ is the estimated probability for the l-th label.
42
 
43
  #### tf-allociné and barthez-sentiment-classification
44
+ [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) and [moussaKam/barthez-sentiment-classification](https://huggingface.co/moussaKam/barthez-sentient-classification) use the same bi-class definition between them. To bring this back to a two-class problem, we will only consider the "1 star" and "2 stars" labels for the "negative" sentiments and "4 stars" and "5 stars" for "positive" sentiments. We exclude the "3 stars" which can be interpreted as a "neutral" class. In this context, the problem of +/-1 star estimation errors disappears. Then we use the classical accuracy definition.
45
 
46
  How to use DistilCamemBERT-Sentiment
47
  ------------------------------------