lighteternal
commited on
Commit
•
4c60f0f
1
Parent(s):
71090fb
Update from earendil
Browse files
README.md
CHANGED
@@ -1,25 +1,32 @@
|
|
1 |
---
|
2 |
-
language:
|
3 |
-
|
|
|
|
|
4 |
tags:
|
5 |
- xlm-roberta-base
|
6 |
datasets:
|
7 |
- multi_nli
|
8 |
- snli
|
9 |
- allnli_greek
|
|
|
10 |
metrics:
|
11 |
- accuracy
|
12 |
-
|
13 |
widget:
|
14 |
- text: "Το Facebook κυκλοφόρησε τα πρώτα «έξυπνα» γυαλιά επαυξημένης πραγματικότητας"
|
15 |
candidate_labels: "πολιτική, τεχνολογία, αθλητισμός"
|
|
|
|
|
16 |
---
|
17 |
|
18 |
# Cross-Encoder for Greek Natural Language Inference (Textual Entailment) & Zero-Shot Classification
|
|
|
|
|
19 |
This model was trained using [SentenceTransformers](https://sbert.net) [Cross-Encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html) class.
|
20 |
-
|
21 |
## Training Data
|
22 |
-
The model was trained on the the Greek version of the
|
23 |
|
24 |
The model can be used in two ways:
|
25 |
* NLI/Textual Entailment: For a given sentence pair, it will output three scores corresponding to the labels: contradiction, entailment, neutral.
|
@@ -28,6 +35,7 @@ The model can be used in two ways:
|
|
28 |
## Performance
|
29 |
|
30 |
Evaluation on classification accuracy (entailment, contradiction, neutral) on mixed (Greek+English) AllNLI-dev set:
|
|
|
31 |
| Metric | Value |
|
32 |
| --- | --- |
|
33 |
| Accuracy | 0.8409 |
|
@@ -41,7 +49,7 @@ Evaluation on classification accuracy (entailment, contradiction, neutral) on mi
|
|
41 |
Pre-trained models can be used like this:
|
42 |
```python
|
43 |
from sentence_transformers import CrossEncoder
|
44 |
-
model = CrossEncoder('
|
45 |
scores = model.predict([('Δύο άνθρωποι συναντιούνται στο δρόμο', 'Ο δρόμος έχει κόσμο'),
|
46 |
('Ένα μαύρο αυτοκίνητο ξεκινάει στη μέση του πλήθους.', 'Ένας άντρας οδηγάει σε ένα μοναχικό δρόμο'),
|
47 |
('Δυο γυναίκες μιλάνε στο κινητό', 'Το τραπέζι ήταν πράσινο')])
|
@@ -64,8 +72,8 @@ You can use the model also directly with Transformers library (without SentenceT
|
|
64 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
65 |
import torch
|
66 |
|
67 |
-
model = AutoModelForSequenceClassification.from_pretrained('
|
68 |
-
tokenizer = AutoTokenizer.from_pretrained('
|
69 |
|
70 |
features = tokenizer(['Δύο άνθρωποι συναντιούνται στο δρόμο', 'Ο δρόμος έχει κόσμο'],
|
71 |
['Ένα μαύρο αυτοκίνητο ξεκινάει στη μέση του πλήθους.', 'Ένας άντρας οδηγάει σε ένα μοναχικό δρόμο.'],
|
@@ -84,7 +92,7 @@ This model can also be used for zero-shot-classification:
|
|
84 |
```python
|
85 |
from transformers import pipeline
|
86 |
|
87 |
-
classifier = pipeline("zero-shot-classification", model='
|
88 |
|
89 |
sent = "Το Facebook κυκλοφόρησε τα πρώτα «έξυπνα» γυαλιά επαυξημένης πραγματικότητας"
|
90 |
candidate_labels = ["πολιτική", "τεχνολογία", "αθλητισμός"]
|
@@ -98,4 +106,3 @@ The research work was supported by the Hellenic Foundation for Research and Inno
|
|
98 |
Citation for the Greek model TBA.
|
99 |
Based on the work [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
|
100 |
Kudos to @nreimers (Nils Reimers) for his support on Github .
|
101 |
-
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
-el
|
4 |
+
-en
|
5 |
+
|
6 |
tags:
|
7 |
- xlm-roberta-base
|
8 |
datasets:
|
9 |
- multi_nli
|
10 |
- snli
|
11 |
- allnli_greek
|
12 |
+
|
13 |
metrics:
|
14 |
- accuracy
|
15 |
+
pipeline_tag: zero-shot-classification
|
16 |
widget:
|
17 |
- text: "Το Facebook κυκλοφόρησε τα πρώτα «έξυπνα» γυαλιά επαυξημένης πραγματικότητας"
|
18 |
candidate_labels: "πολιτική, τεχνολογία, αθλητισμός"
|
19 |
+
multi_class: false
|
20 |
+
license: apache-2.0
|
21 |
---
|
22 |
|
23 |
# Cross-Encoder for Greek Natural Language Inference (Textual Entailment) & Zero-Shot Classification
|
24 |
+
## By the Hellenic Army Academy (SSE) and the Technical University of Crete (TUC)
|
25 |
+
|
26 |
This model was trained using [SentenceTransformers](https://sbert.net) [Cross-Encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html) class.
|
27 |
+
|
28 |
## Training Data
|
29 |
+
The model was trained on the the combined Greek+English version of the AllNLI dataset(sum of [SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/)). The Greek part was created using the EN2EL NMT model available [here](https://huggingface.co/lighteternal/SSE-TUC-mt-en-el-cased).
|
30 |
|
31 |
The model can be used in two ways:
|
32 |
* NLI/Textual Entailment: For a given sentence pair, it will output three scores corresponding to the labels: contradiction, entailment, neutral.
|
|
|
35 |
## Performance
|
36 |
|
37 |
Evaluation on classification accuracy (entailment, contradiction, neutral) on mixed (Greek+English) AllNLI-dev set:
|
38 |
+
|
39 |
| Metric | Value |
|
40 |
| --- | --- |
|
41 |
| Accuracy | 0.8409 |
|
|
|
49 |
Pre-trained models can be used like this:
|
50 |
```python
|
51 |
from sentence_transformers import CrossEncoder
|
52 |
+
model = CrossEncoder('lighteternal/nli-xlm-r-greek')
|
53 |
scores = model.predict([('Δύο άνθρωποι συναντιούνται στο δρόμο', 'Ο δρόμος έχει κόσμο'),
|
54 |
('Ένα μαύρο αυτοκίνητο ξεκινάει στη μέση του πλήθους.', 'Ένας άντρας οδηγάει σε ένα μοναχικό δρόμο'),
|
55 |
('Δυο γυναίκες μιλάνε στο κινητό', 'Το τραπέζι ήταν πράσινο')])
|
|
|
72 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
73 |
import torch
|
74 |
|
75 |
+
model = AutoModelForSequenceClassification.from_pretrained('lighteternal/nli-xlm-r-greek')
|
76 |
+
tokenizer = AutoTokenizer.from_pretrained('lighteternal/nli-xlm-r-greek')
|
77 |
|
78 |
features = tokenizer(['Δύο άνθρωποι συναντιούνται στο δρόμο', 'Ο δρόμος έχει κόσμο'],
|
79 |
['Ένα μαύρο αυτοκίνητο ξεκινάει στη μέση του πλήθους.', 'Ένας άντρας οδηγάει σε ένα μοναχικό δρόμο.'],
|
|
|
92 |
```python
|
93 |
from transformers import pipeline
|
94 |
|
95 |
+
classifier = pipeline("zero-shot-classification", model='lighteternal/nli-xlm-r-greek')
|
96 |
|
97 |
sent = "Το Facebook κυκλοφόρησε τα πρώτα «έξυπνα» γυαλιά επαυξημένης πραγματικότητας"
|
98 |
candidate_labels = ["πολιτική", "τεχνολογία", "αθλητισμός"]
|
|
|
106 |
Citation for the Greek model TBA.
|
107 |
Based on the work [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
|
108 |
Kudos to @nreimers (Nils Reimers) for his support on Github .
|
|