classla
/

roberta-base-frenk-hate

Text Classification

Inference Endpoints

Model card Files Files and versions Community

5roop commited on Sep 14, 2021

Commit

507d26b

•

1 Parent(s): b159c4d

Added stats, removed distilbert

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ The same pipeline was run with two other models and with the same dataset. Accur
 |roberta-base-frenk-hate|0.7915|0.7785|
 |xlm-roberta-large |0.7904|0.77876|
 |xlm-roberta-base |0.7577|0.7402|
-|distilbert-base-uncased-finetuned-sst-2-english|0.7201|0.69862|
@@ -37,15 +37,16 @@ Comparison with `xlm-roberta-base`:
 |Mann Whithney U-test|0.00108|0.00108|
 |Student t-test | 1.35e-08 | 1.05e-07|
-Comparison with `distilbert-base-uncased-finetuned-sst-2-english`:
 | test | accuracy p-value | macro F1 p-value|
 | --- | --- | --- |
-|Wilcoxon|0.00781|0.00781|
-|Mann Whithney U-test|0.00108|0.00108|
-|Student t-test | 1.33e-12 	 | 3.03e-12|
-Comparison with `xlm-roberta-large` yielded inconclusive results; whereas accuracy was outperformed by this model, the macro F1 score was not. Neither metric allowed for statistically significant conclusions about which model might be better.
 ## Use examples

 |roberta-base-frenk-hate|0.7915|0.7785|
 |xlm-roberta-large |0.7904|0.77876|
 |xlm-roberta-base |0.7577|0.7402|
 |Mann Whithney U-test|0.00108|0.00108|
 |Student t-test | 1.35e-08 | 1.05e-07|
+Comparison with `xlm-roberta-large` yielded inconclusive results.  `roberta-base` has average accuracy 0.7915, while `xlm-roberta-large` has average accuracy of 0.7904. If macro F1 scores were to be compared, `roberta-base` actually has lower average than `xlm-roberta-large`: 0.77852 vs 0.77876 respectively. The same statistical tests were performed with the premise that `roberta-base` has greater metrics, and the results are given below.
 | test | accuracy p-value | macro F1 p-value|
 | --- | --- | --- |
+|Wilcoxon|0.188|0.406|
+|Mann Whithey|0.375|0.649|
+|Student t-test | 0.681| 0.934|
+With reversed premise (i.e., that `xlm-roberta-large` has greater statistics) the Wilcoxon p-value for macro F1 scores for this case reaches 0.656, Mann-Whithey p-value is 0.399, and of course the Student p-value stays the same. It was therefore concluded that performance of the two models are not statistically significantly different from one another.
 ## Use examples