Commit
·
c0fead2
1
Parent(s):
403444f
Update README.md
Browse files
README.md
CHANGED
@@ -62,6 +62,56 @@ The model can only do text classification tasks.
|
|
62 |
|
63 |
Please consult the original DeBERTa paper and the papers for the different datasets for potential biases.
|
64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
## License
|
66 |
The base model (DeBERTa-v3) is published under the MIT license.
|
67 |
The datasets the model was fine-tuned on are published under a diverse set of licenses.
|
|
|
62 |
|
63 |
Please consult the original DeBERTa paper and the papers for the different datasets for potential biases.
|
64 |
|
65 |
+
## Metrics
|
66 |
+
|
67 |
+
Balanced accuracy metrics on all datasets. `deberta-v3-large-zeroshot-v1.1-heldout` indicates zeroshot performance on the respective dataset.
|
68 |
+
To calculate this metrics, 28 different models were trained, each with one dataset held out from training to simulate a zeroshot setup.
|
69 |
+
`deberta-v3-large-zeroshot-v1.1-all-33` was trained on all datasets, with only maximum 500 texts per class to avoid overfitting.
|
70 |
+
(The metrics in the last column are therefore not strictly zeroshot.)
|
71 |
+
|
72 |
+
|
73 |
+
| | deberta-v3-large-mnli-fever-anli-ling-wanli-binary | deberta-v3-large-zeroshot-v1.1-heldout | deberta-v3-large-zeroshot-v1.1-all-33 |
|
74 |
+
|:---------------------------|----------------------------:|-----------------------------------------:|----------------------------------------:|
|
75 |
+
| datasets mean (w/o nli) | 64.1 | 73.4 | 85.2 |
|
76 |
+
| amazonpolarity (2) | 94.7 | 96.6 | 96.8 |
|
77 |
+
| imdb (2) | 90.3 | 95.2 | 95.5 |
|
78 |
+
| appreviews (2) | 93.6 | 94.3 | 94.7 |
|
79 |
+
| yelpreviews (2) | 98.5 | 98.4 | 98.9 |
|
80 |
+
| rottentomatoes (2) | 83.9 | 90.5 | 90.8 |
|
81 |
+
| emotiondair (6) | 49.2 | 42.1 | 72.1 |
|
82 |
+
| emocontext (4) | 57 | 69.3 | 82.4 |
|
83 |
+
| empathetic (32) | 42 | 34.4 | 58 |
|
84 |
+
| financialphrasebank (3) | 77.4 | 77.5 | 91.9 |
|
85 |
+
| banking77 (72) | 29.1 | 52.8 | 72.2 |
|
86 |
+
| massive (59) | 47.3 | 64.7 | 77.3 |
|
87 |
+
| wikitoxic_toxicaggreg (2) | 81.6 | 86.6 | 91 |
|
88 |
+
| wikitoxic_obscene (2) | 85.9 | 91.9 | 93.1 |
|
89 |
+
| wikitoxic_threat (2) | 77.9 | 93.7 | 97.6 |
|
90 |
+
| wikitoxic_insult (2) | 77.8 | 91.1 | 92.3 |
|
91 |
+
| wikitoxic_identityhate (2) | 86.4 | 89.8 | 95.7 |
|
92 |
+
| hateoffensive (3) | 62.8 | 66.5 | 88.4 |
|
93 |
+
| hatexplain (3) | 46.9 | 61 | 76.9 |
|
94 |
+
| biasframes_offensive (2) | 62.5 | 86.6 | 89 |
|
95 |
+
| biasframes_sex (2) | 87.6 | 89.6 | 92.6 |
|
96 |
+
| biasframes_intent (2) | 54.8 | 88.6 | 89.9 |
|
97 |
+
| agnews (4) | 81.9 | 82.8 | 90.9 |
|
98 |
+
| yahootopics (10) | 37.7 | 65.6 | 74.3 |
|
99 |
+
| trueteacher (2) | 51.2 | 54.9 | 86.6 |
|
100 |
+
| spam (2) | 52.6 | 51.8 | 97.1 |
|
101 |
+
| wellformedquery (2) | 49.9 | 40.4 | 82.7 |
|
102 |
+
| manifesto (56) | 10.6 | 29.4 | 44.1 |
|
103 |
+
| capsotu (21) | 23.2 | 69.4 | 74 |
|
104 |
+
| mnli_m (2) | 93.1 | nan | 93.1 |
|
105 |
+
| mnli_mm (2) | 93.2 | nan | 93.2 |
|
106 |
+
| fevernli (2) | 89.3 | nan | 89.5 |
|
107 |
+
| anli_r1 (2) | 87.9 | nan | 87.3 |
|
108 |
+
| anli_r2 (2) | 76.3 | nan | 78 |
|
109 |
+
| anli_r3 (2) | 73.6 | nan | 74.1 |
|
110 |
+
| wanli (2) | 82.8 | nan | 82.7 |
|
111 |
+
| lingnli (2) | 90.2 | nan | 89.6 |
|
112 |
+
|
113 |
+
|
114 |
+
|
115 |
## License
|
116 |
The base model (DeBERTa-v3) is published under the MIT license.
|
117 |
The datasets the model was fine-tuned on are published under a diverse set of licenses.
|