Update README.md
Browse files
README.md
CHANGED
@@ -15,9 +15,7 @@ datasets:
|
|
15 |
|
16 |
# emrecan/bert-base-turkish-cased-mean-nli-stsb-tr
|
17 |
|
18 |
-
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
19 |
-
|
20 |
-
<!--- Describe your model here -->
|
21 |
|
22 |
## Usage (Sentence-Transformers)
|
23 |
|
@@ -31,7 +29,7 @@ Then you can use the model like this:
|
|
31 |
|
32 |
```python
|
33 |
from sentence_transformers import SentenceTransformer
|
34 |
-
sentences = ["
|
35 |
|
36 |
model = SentenceTransformer('emrecan/bert-base-turkish-cased-mean-nli-stsb-tr')
|
37 |
embeddings = model.encode(sentences)
|
@@ -56,7 +54,7 @@ def mean_pooling(model_output, attention_mask):
|
|
56 |
|
57 |
|
58 |
# Sentences we want sentence embeddings for
|
59 |
-
sentences = [
|
60 |
|
61 |
# Load model from HuggingFace Hub
|
62 |
tokenizer = AutoTokenizer.from_pretrained('emrecan/bert-base-turkish-cased-mean-nli-stsb-tr')
|
@@ -80,12 +78,20 @@ print(sentence_embeddings)
|
|
80 |
|
81 |
## Evaluation Results
|
82 |
|
83 |
-
|
84 |
|
85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
|
87 |
|
88 |
## Training
|
|
|
|
|
89 |
The model was trained with the parameters:
|
90 |
|
91 |
**DataLoader**:
|
|
|
15 |
|
16 |
# emrecan/bert-base-turkish-cased-mean-nli-stsb-tr
|
17 |
|
18 |
+
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search. The model was trained on Turkish machine translated versions of [NLI](https://huggingface.co/datasets/nli_tr) and [STS-b](https://huggingface.co/datasets/emrecan/stsb-mt-turkish) datasets, using example [training scripts]( https://github.com/UKPLab/sentence-transformers/tree/master/examples/training) from sentence-transformers GitHub repository.
|
|
|
|
|
19 |
|
20 |
## Usage (Sentence-Transformers)
|
21 |
|
|
|
29 |
|
30 |
```python
|
31 |
from sentence_transformers import SentenceTransformer
|
32 |
+
sentences = ["Bu örnek bir cümle", "Her cümle vektöre çevriliyor"]
|
33 |
|
34 |
model = SentenceTransformer('emrecan/bert-base-turkish-cased-mean-nli-stsb-tr')
|
35 |
embeddings = model.encode(sentences)
|
|
|
54 |
|
55 |
|
56 |
# Sentences we want sentence embeddings for
|
57 |
+
sentences = ["Bu örnek bir cümle", "Her cümle vektöre çevriliyor"]
|
58 |
|
59 |
# Load model from HuggingFace Hub
|
60 |
tokenizer = AutoTokenizer.from_pretrained('emrecan/bert-base-turkish-cased-mean-nli-stsb-tr')
|
|
|
78 |
|
79 |
## Evaluation Results
|
80 |
|
81 |
+
Evaluation results on test and development sets are given below:
|
82 |
|
83 |
+
| Split | Epoch | cosine_pearson | cosine_spearman | euclidean_pearson | euclidean_spearman | manhattan_pearson | manhattan_spearman | dot_pearson | dot_spearman |
|
84 |
+
|------------|-------|----------------|-----------------|-------------------|--------------------|-------------------|--------------------|-------------|--------------|
|
85 |
+
| test | - | 0.834 | 0.830 | 0.820 | 0.819 | 0.819 | 0.818 | 0.799 | 0.789 |
|
86 |
+
| validation | 1 | 0.850 | 0.848 | 0.831 | 0.835 | 0.83 | 0.83 | 0.80 | 0.806 |
|
87 |
+
| validation | 2 | 0.857 | 0.857 | 0.844 | 0.848 | 0.844 | 0.848 | 0.813 | 0.810 |
|
88 |
+
| validation | 3 | 0.860 | 0.859 | 0.846 | 0.851 | 0.846 | 0.850 | 0.825 | 0.822 |
|
89 |
+
| validation | 4 | 0.859 | 0.860 | 0.846 | 0.851 | 0.846 | 0.851 | 0.825 | 0.823 |
|
90 |
|
91 |
|
92 |
## Training
|
93 |
+
Training scripts [`training_nli_v2.py`](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py) and [`training_stsbenchmark_continue_training.py`](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) were used to train the model.
|
94 |
+
|
95 |
The model was trained with the parameters:
|
96 |
|
97 |
**DataLoader**:
|