5roop commited on
Commit
507d26b
1 Parent(s): b159c4d

Added stats, removed distilbert

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -23,7 +23,7 @@ The same pipeline was run with two other models and with the same dataset. Accur
23
  |roberta-base-frenk-hate|0.7915|0.7785|
24
  |xlm-roberta-large |0.7904|0.77876|
25
  |xlm-roberta-base |0.7577|0.7402|
26
- |distilbert-base-uncased-finetuned-sst-2-english|0.7201|0.69862|
27
 
28
 
29
 
@@ -37,15 +37,16 @@ Comparison with `xlm-roberta-base`:
37
  |Mann Whithney U-test|0.00108|0.00108|
38
  |Student t-test | 1.35e-08 | 1.05e-07|
39
 
40
- Comparison with `distilbert-base-uncased-finetuned-sst-2-english`:
 
41
 
42
  | test | accuracy p-value | macro F1 p-value|
43
  | --- | --- | --- |
44
- |Wilcoxon|0.00781|0.00781|
45
- |Mann Whithney U-test|0.00108|0.00108|
46
- |Student t-test | 1.33e-12 | 3.03e-12|
47
 
48
- Comparison with `xlm-roberta-large` yielded inconclusive results; whereas accuracy was outperformed by this model, the macro F1 score was not. Neither metric allowed for statistically significant conclusions about which model might be better.
49
 
50
  ## Use examples
51
 
23
  |roberta-base-frenk-hate|0.7915|0.7785|
24
  |xlm-roberta-large |0.7904|0.77876|
25
  |xlm-roberta-base |0.7577|0.7402|
26
+
27
 
28
 
29
 
37
  |Mann Whithney U-test|0.00108|0.00108|
38
  |Student t-test | 1.35e-08 | 1.05e-07|
39
 
40
+
41
+ Comparison with `xlm-roberta-large` yielded inconclusive results. `roberta-base` has average accuracy 0.7915, while `xlm-roberta-large` has average accuracy of 0.7904. If macro F1 scores were to be compared, `roberta-base` actually has lower average than `xlm-roberta-large`: 0.77852 vs 0.77876 respectively. The same statistical tests were performed with the premise that `roberta-base` has greater metrics, and the results are given below.
42
 
43
  | test | accuracy p-value | macro F1 p-value|
44
  | --- | --- | --- |
45
+ |Wilcoxon|0.188|0.406|
46
+ |Mann Whithey|0.375|0.649|
47
+ |Student t-test | 0.681| 0.934|
48
 
49
+ With reversed premise (i.e., that `xlm-roberta-large` has greater statistics) the Wilcoxon p-value for macro F1 scores for this case reaches 0.656, Mann-Whithey p-value is 0.399, and of course the Student p-value stays the same. It was therefore concluded that performance of the two models are not statistically significantly different from one another.
50
 
51
  ## Use examples
52