Zero-Shot Classification
Transformers
PyTorch
Safetensors
bert
text-classification
Inference Endpoints
saattrupdan commited on
Commit
e06ebc7
1 Parent(s): 17d420d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -2
README.md CHANGED
@@ -66,9 +66,28 @@ You can use this model in your scripts as follows:
66
 
67
  ## Performance
68
 
69
- As Danish is, as far as we are aware, the only Scandinavian language with a gold standard NLI dataset, namely the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439), we report evaluation scores on the test split of that dataset.
70
 
71
- We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
74
  | :-------- | :------------ | :--------- | :----------- | :----------- |
@@ -80,6 +99,39 @@ We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as wel
80
  | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
81
 
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ## Training procedure
84
 
85
  It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.
 
66
 
67
  ## Performance
68
 
69
+ We assess the models both on their aggregate Scandinavian performance, as well as their language-specific Danish, Swedish and Norwegian Bokmål performance.
70
 
71
+ In all cases, we report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
72
+
73
+
74
+ ### Scandinavian Evaluation
75
+
76
+ The Scandinavian scores are the average of the Danish, Swedish and Norwegian scores, which can be found in the sections below.
77
+
78
+ | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
79
+ | :-------- | :------------ | :--------- | :----------- | :----------- |
80
+ | `alexandrainst/scandi-nli-large` (this) | asd | asd | asd | 354M |
81
+ | [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 69.01% | 71.99% | 80.66% | 279M |
82
+ | [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
83
+ | [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 63.94% | 70.41% | 77.23% | 279M |
84
+ | [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
85
+ | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
86
+
87
+
88
+ ### Danish Evaluation
89
+
90
+ We use a test split of the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439) to evaluate the Danish performance of the models.
91
 
92
  | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
93
  | :-------- | :------------ | :--------- | :----------- | :----------- |
 
99
  | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
100
 
101
 
102
+ ### Swedish Evaluation
103
+
104
+ We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Swedish performance of the models.
105
+
106
+ We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Swedish.
107
+
108
+ | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
109
+ | :-------- | :------------ | :--------- | :----------- | :----------- |
110
+ | `alexandrainst/scandi-nli-large` (this) | asd | asd | asd | 354M |
111
+ | [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
112
+ | [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 73.84% | 82.46% | 82.58% | 279M |
113
+ | [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 73.32% | 82.15% | 82.08% | 279M |
114
+ | [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
115
+ | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
116
+
117
+
118
+ ### Norwegian Evaluation
119
+
120
+ We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Norwegian performance of the models.
121
+
122
+ We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Norwegian.
123
+
124
+ | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
125
+ | :-------- | :------------ | :--------- | :----------- | :----------- |
126
+ | `alexandrainst/scandi-nli-large` (this) | asd | asd | asd | 354M |
127
+ | [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 65.33% | 76.73% | 76.65% | 279M |
128
+ | [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
129
+ | [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | asd | asd | asd | 178M |
130
+ | [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 65.18% | 76.76% | 76.77% | 279M |
131
+ | [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
132
+ | [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
133
+
134
+
135
  ## Training procedure
136
 
137
  It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.