MoritzLaurer
/

deberta-v3-large-zeroshot-v1.1-all-33

@@ -62,6 +62,56 @@ The model can only do text classification tasks.
 Please consult the original DeBERTa paper and the papers for the different datasets for potential biases.
 ## License
 The base model (DeBERTa-v3) is published under the MIT license.
 The datasets the model was fine-tuned on are published under a diverse set of licenses.

 Please consult the original DeBERTa paper and the papers for the different datasets for potential biases.
+## Metrics
+Balanced accuracy metrics on all datasets. `deberta-v3-large-zeroshot-v1.1-heldout` indicates zeroshot performance on the respective dataset.
+To calculate this metrics, 28 different models were trained, each with one dataset held out from training to simulate a zeroshot setup.
+`deberta-v3-large-zeroshot-v1.1-all-33` was trained on all datasets, with only maximum 500 texts per class to avoid overfitting.
+(The metrics in the last column are therefore not strictly zeroshot.)
+|                            |   deberta-v3-large-mnli-fever-anli-ling-wanli-binary |   deberta-v3-large-zeroshot-v1.1-heldout |   deberta-v3-large-zeroshot-v1.1-all-33 |
+|:---------------------------|----------------------------:|-----------------------------------------:|----------------------------------------:|
+| datasets mean (w/o nli)    |                        64.1 |                                     73.4 |                                    85.2 |
+| amazonpolarity (2)         |                        94.7 |                                     96.6 |                                    96.8 |
+| imdb (2)                   |                        90.3 |                                     95.2 |                                    95.5 |
+| appreviews (2)             |                        93.6 |                                     94.3 |                                    94.7 |
+| yelpreviews (2)            |                        98.5 |                                     98.4 |                                    98.9 |
+| rottentomatoes (2)         |                        83.9 |                                     90.5 |                                    90.8 |
+| emotiondair (6)            |                        49.2 |                                     42.1 |                                    72.1 |
+| emocontext (4)             |                        57   |                                     69.3 |                                    82.4 |
+| empathetic (32)            |                        42   |                                     34.4 |                                    58   |
+| financialphrasebank (3)    |                        77.4 |                                     77.5 |                                    91.9 |
+| banking77 (72)             |                        29.1 |                                     52.8 |                                    72.2 |
+| massive (59)               |                        47.3 |                                     64.7 |                                    77.3 |
+| wikitoxic_toxicaggreg (2)  |                        81.6 |                                     86.6 |                                    91   |
+| wikitoxic_obscene (2)      |                        85.9 |                                     91.9 |                                    93.1 |
+| wikitoxic_threat (2)       |                        77.9 |                                     93.7 |                                    97.6 |
+| wikitoxic_insult (2)       |                        77.8 |                                     91.1 |                                    92.3 |
+| wikitoxic_identityhate (2) |                        86.4 |                                     89.8 |                                    95.7 |
+| hateoffensive (3)          |                        62.8 |                                     66.5 |                                    88.4 |
+| hatexplain (3)             |                        46.9 |                                     61   |                                    76.9 |
+| biasframes_offensive (2)   |                        62.5 |                                     86.6 |                                    89   |
+| biasframes_sex (2)         |                        87.6 |                                     89.6 |                                    92.6 |
+| biasframes_intent (2)      |                        54.8 |                                     88.6 |                                    89.9 |
+| agnews (4)                 |                        81.9 |                                     82.8 |                                    90.9 |
+| yahootopics (10)           |                        37.7 |                                     65.6 |                                    74.3 |
+| trueteacher (2)            |                        51.2 |                                     54.9 |                                    86.6 |
+| spam (2)                   |                        52.6 |                                     51.8 |                                    97.1 |
+| wellformedquery (2)        |                        49.9 |                                     40.4 |                                    82.7 |
+| manifesto (56)             |                        10.6 |                                     29.4 |                                    44.1 |
+| capsotu (21)               |                        23.2 |                                     69.4 |                                    74   |
+| mnli_m (2)                 |                        93.1 |                                    nan   |                                    93.1 |
+| mnli_mm (2)                |                        93.2 |                                    nan   |                                    93.2 |
+| fevernli (2)               |                        89.3 |                                    nan   |                                    89.5 |
+| anli_r1 (2)                |                        87.9 |                                    nan   |                                    87.3 |
+| anli_r2 (2)                |                        76.3 |                                    nan   |                                    78   |
+| anli_r3 (2)                |                        73.6 |                                    nan   |                                    74.1 |
+| wanli (2)                  |                        82.8 |                                    nan   |                                    82.7 |
+| lingnli (2)                |                        90.2 |                                    nan   |                                    89.6 |
 ## License
 The base model (DeBERTa-v3) is published under the MIT license.
 The datasets the model was fine-tuned on are published under a diverse set of licenses.