saattrupdan
commited on
Commit
•
e06ebc7
1
Parent(s):
17d420d
Update README.md
Browse files
README.md
CHANGED
@@ -66,9 +66,28 @@ You can use this model in your scripts as follows:
|
|
66 |
|
67 |
## Performance
|
68 |
|
69 |
-
|
70 |
|
71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
|
74 |
| :-------- | :------------ | :--------- | :----------- | :----------- |
|
@@ -80,6 +99,39 @@ We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as wel
|
|
80 |
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
|
81 |
|
82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
## Training procedure
|
84 |
|
85 |
It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.
|
|
|
66 |
|
67 |
## Performance
|
68 |
|
69 |
+
We assess the models both on their aggregate Scandinavian performance, as well as their language-specific Danish, Swedish and Norwegian Bokmål performance.
|
70 |
|
71 |
+
In all cases, we report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
|
72 |
+
|
73 |
+
|
74 |
+
### Scandinavian Evaluation
|
75 |
+
|
76 |
+
The Scandinavian scores are the average of the Danish, Swedish and Norwegian scores, which can be found in the sections below.
|
77 |
+
|
78 |
+
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
|
79 |
+
| :-------- | :------------ | :--------- | :----------- | :----------- |
|
80 |
+
| `alexandrainst/scandi-nli-large` (this) | asd | asd | asd | 354M |
|
81 |
+
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 69.01% | 71.99% | 80.66% | 279M |
|
82 |
+
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
|
83 |
+
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 63.94% | 70.41% | 77.23% | 279M |
|
84 |
+
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
|
85 |
+
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
|
86 |
+
|
87 |
+
|
88 |
+
### Danish Evaluation
|
89 |
+
|
90 |
+
We use a test split of the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439) to evaluate the Danish performance of the models.
|
91 |
|
92 |
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
|
93 |
| :-------- | :------------ | :--------- | :----------- | :----------- |
|
|
|
99 |
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
|
100 |
|
101 |
|
102 |
+
### Swedish Evaluation
|
103 |
+
|
104 |
+
We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Swedish performance of the models.
|
105 |
+
|
106 |
+
We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Swedish.
|
107 |
+
|
108 |
+
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
|
109 |
+
| :-------- | :------------ | :--------- | :----------- | :----------- |
|
110 |
+
| `alexandrainst/scandi-nli-large` (this) | asd | asd | asd | 354M |
|
111 |
+
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
|
112 |
+
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 73.84% | 82.46% | 82.58% | 279M |
|
113 |
+
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 73.32% | 82.15% | 82.08% | 279M |
|
114 |
+
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
|
115 |
+
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
|
116 |
+
|
117 |
+
|
118 |
+
### Norwegian Evaluation
|
119 |
+
|
120 |
+
We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Norwegian performance of the models.
|
121 |
+
|
122 |
+
We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Norwegian.
|
123 |
+
|
124 |
+
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
|
125 |
+
| :-------- | :------------ | :--------- | :----------- | :----------- |
|
126 |
+
| `alexandrainst/scandi-nli-large` (this) | asd | asd | asd | 354M |
|
127 |
+
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 65.33% | 76.73% | 76.65% | 279M |
|
128 |
+
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
|
129 |
+
| [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | asd | asd | asd | 178M |
|
130 |
+
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 65.18% | 76.76% | 76.77% | 279M |
|
131 |
+
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
|
132 |
+
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | asd | asd | asd | **22M** |
|
133 |
+
|
134 |
+
|
135 |
## Training procedure
|
136 |
|
137 |
It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.
|