File size: 1,503 Bytes
6889235
adc3451
1b23db2
 
 
adc3451
1b23db2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adc3451
1b23db2
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
pipeline_tag: sentence-similarity
language: fr
datasets:
- stsb_multi_mt
tags:
- Text
- Sentence Similarity
- Sentence-Embedding
- camembert-base
license: apache-2.0
model-index:
- name: sentence-flaubert-base by Van Tuan DANG
  results:
  - task: 
      name: Sentence-Embedding
      type: Text Similarity
    dataset:
      name: Text Similarity fr
      type: stsb_multi_mt
      args: fr
    metrics:
       - name: Test Pearson correlation coefficient
         type: Pearson_correlation_coefficient
         value:  xx.xx
---
## Pre-trained sentence embedding models are the state-of-the-art of Sentence Embeddings for French.
Model is Fine-tuned using pre-trained [flaubert/flaubert_base_uncased](https://huggingface.co/flaubert/flaubert_base_uncased) and
[Siamese BERT-Networks with 'sentences-transformers'](https://www.sbert.net/) combine with Augmented SBERT on dataset [stsb](https://huggingface.co/datasets/stsb_multi_mt/viewer/fr/train)


## Usage
The model can be used directly (without a language model) as follows:

```python
from sentence_transformers import SentenceTransformer
model =  SentenceTransformer("Lajavaness/sentence-flaubert-base")
sentences = ["Un avion est en train de décoller.",
          "Un homme joue d'une grande flûte.",
          "Un homme étale du fromage râpé sur une pizza.",
          "Une personne jette un chat au plafond.",
          "Une personne est en train de plier un morceau de papier.",
          ]
embeddings = model.encode(sentences)
```