cointegrated commited on
Commit
097e5df
1 Parent(s): 872779e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -25
README.md CHANGED
@@ -77,36 +77,17 @@ Some datasets obtained from the original sources:
77
 
78
  ## Performance
79
 
80
- The table below shows ROC AUC for three models on small samples of the DEV sets:
81
  - [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
82
  - [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
83
  - [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral
 
 
84
 
85
- |model |tiny/entailment|twoway/entailment|threeway/entailment|threeway/contradiction|threeway/neutral|
86
- |-----------|---------------|-----------------|-------------------|-------------------------|-------------------|
87
- |add_one_rte|0.82 |0.90 |0.92 | | |
88
- |anli_r1 |0.50 |0.68 |0.66 |0.70 |0.75 |
89
- |anli_r2 |0.55 |0.62 |0.62 |0.62 |0.69 |
90
- |anli_r3 |0.50 |0.63 |0.59 |0.62 |0.64 |
91
- |copa |0.55 |0.60 |0.62 | | |
92
- |fever |0.88 |0.94 |0.94 |0.91 |0.92 |
93
- |help |0.74 |0.87 |0.46 | | |
94
- |iie |0.79 |0.85 |0.54 | | |
95
- |imppres |0.94 |0.99 |0.99 |0.99 |0.99 |
96
- |joci |0.87 |0.93 |0.93 |0.85 |0.80 |
97
- |mnli |0.87 |0.92 |0.93 |0.89 |0.86 |
98
- |monli |0.94 |1.00 |0.67 | | |
99
- |mpe |0.82 |0.90 |0.90 |0.91 |0.80 |
100
- |scitail |0.80 |0.96 |0.85 | | |
101
- |sick |0.97 |0.99 |0.99 |0.98 |0.96 |
102
- |snli |0.95 |0.98 |0.98 |0.99 |0.97 |
103
- |terra |0.73 |0.93 |0.93 | | |
104
-
105
-
106
- |m |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|mean |
107
- |------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
108
- |n |387 |1000 |1000 |1200 |200 |20474|3355|31232|7661 |939 |19647|269 |1000|2126 |500 |9831|307 |101128|
109
  |------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
 
110
  |tiny/entailment |0.77 |0.59 |0.52 |0.53 |0.53|0.90 |0.81|0.78 |0.93 |0.81|0.82 |0.91 |0.81|0.78 |0.93|0.95|0.67 |0.77 |
111
  |twoway/entailment |0.89 |0.73 |0.61 |0.62 |0.58|0.96 |0.92|0.87 |0.99 |0.90|0.90 |0.99 |0.91|0.96 |0.97|0.97|0.87 |0.86 |
112
  |threeway/entailment |0.91 |0.75 |0.61 |0.61 |0.57|0.96 |0.56|0.61 |0.99 |0.90|0.91 |0.67 |0.92|0.84 |0.98|0.98|0.90 |0.80 |
@@ -115,3 +96,10 @@ The table below shows ROC AUC for three models on small samples of the DEV sets:
115
  |threeway/contradiction | |0.71 |0.64 |0.61 | |0.97 | | |1.00 |0.77|0.92 | |0.89| |0.99|0.98| |0.85 |
116
  |threeway/neutral | |0.79 |0.70 |0.62 | |0.91 | | |0.99 |0.68|0.86 | |0.79| |0.96|0.96| |0.83 |
117
 
 
 
 
 
 
 
 
77
 
78
  ## Performance
79
 
80
+ The table below shows ROC AUC (one class vs rest) for five models on the corresponding *dev* sets:
81
  - [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli): a small BERT predicting entailment vs not_entailment
82
  - [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway): a base-sized BERT predicting entailment vs not_entailment
83
  - [threeway](https://huggingface.co/cointegrated/rubert-base-cased-nli-threeway) (**this model**): a base-sized BERT predicting entailment vs contradiction vs neutral
84
+ - [vicgalle-xlm](https://huggingface.co/vicgalle/xlm-roberta-large-xnli-anli): a large multilingual NLI model
85
+ - [facebook-bart](https://huggingface.co/facebook/bart-large-mnli): a large multilingual NLI model
86
 
87
+
88
+ |m |add_one_rte|anli_r1|anli_r2|anli_r3|copa|fever|help|iie |imppres|joci|mnli |monli|mpe |scitail|sick|snli|terra|total |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  |------------------------|-----------|-------|-------|-------|----|-----|----|-----|-------|----|-----|-----|----|-------|----|----|-----|------|
90
+ |n_observations |387 |1000 |1000 |1200 |200 |20474|3355|31232|7661 |939 |19647|269 |1000|2126 |500 |9831|307 |101128|
91
  |tiny/entailment |0.77 |0.59 |0.52 |0.53 |0.53|0.90 |0.81|0.78 |0.93 |0.81|0.82 |0.91 |0.81|0.78 |0.93|0.95|0.67 |0.77 |
92
  |twoway/entailment |0.89 |0.73 |0.61 |0.62 |0.58|0.96 |0.92|0.87 |0.99 |0.90|0.90 |0.99 |0.91|0.96 |0.97|0.97|0.87 |0.86 |
93
  |threeway/entailment |0.91 |0.75 |0.61 |0.61 |0.57|0.96 |0.56|0.61 |0.99 |0.90|0.91 |0.67 |0.92|0.84 |0.98|0.98|0.90 |0.80 |
96
  |threeway/contradiction | |0.71 |0.64 |0.61 | |0.97 | | |1.00 |0.77|0.92 | |0.89| |0.99|0.98| |0.85 |
97
  |threeway/neutral | |0.79 |0.70 |0.62 | |0.91 | | |0.99 |0.68|0.86 | |0.79| |0.96|0.96| |0.83 |
98
 
99
+ For evaluation (and for training of the [tiny](https://huggingface.co/cointegrated/rubert-tiny-bilingual-nli) and [twoway](https://huggingface.co/cointegrated/rubert-base-cased-nli-twoway) models), some extra datasets were used:
100
+ [Add-one RTE](https://cs.brown.edu/people/epavlick/papers/ans.pdf),
101
+ [CoPA](https://people.ict.usc.edu/~gordon/copa.html),
102
+ [IIE](https://aclanthology.org/I17-1100), and
103
+ [SCITAIL](https://allenai.org/data/scitail) taken from [the repo of Felipe Salvatore](https://github.com/felipessalvatore/NLI_datasets) and translatted,
104
+ [HELP](https://github.com/verypluming/HELP) and [MoNLI](https://github.com/atticusg/MoNLI) taken from the original sources and translated,
105
+ and Russian [TERRa](https://russiansuperglue.com/ru/tasks/task_info/TERRa).