netapy commited on
Commit
2a0485d
1 Parent(s): 37356d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -51
README.md CHANGED
@@ -1,55 +1,45 @@
1
  ---
 
2
  tags:
3
- - generated_from_trainer
4
- model-index:
5
- - name: solon-large-06-BIG
6
- results: []
 
 
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
11
-
12
- # solon-large-06-BIG
13
-
14
- This model was trained from scratch on an unknown dataset.
15
-
16
- ## Model description
17
-
18
- More information needed
19
-
20
- ## Intended uses & limitations
21
-
22
- More information needed
23
-
24
- ## Training and evaluation data
25
-
26
- More information needed
27
-
28
- ## Training procedure
29
-
30
- ### Training hyperparameters
31
-
32
- The following hyperparameters were used during training:
33
- - learning_rate: 1e-06
34
- - train_batch_size: 32
35
- - eval_batch_size: 8
36
- - seed: 42
37
- - distributed_type: multi-GPU
38
- - num_devices: 4
39
- - total_train_batch_size: 128
40
- - total_eval_batch_size: 32
41
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
42
- - lr_scheduler_type: linear
43
- - num_epochs: 20.0
44
- - mixed_precision_training: Native AMP
45
-
46
- ### Training results
47
-
48
-
49
-
50
- ### Framework versions
51
-
52
- - Transformers 4.35.2
53
- - Pytorch 2.1.1+cu121
54
- - Datasets 2.3.2
55
- - Tokenizers 0.15.0
 
1
  ---
2
+ pipeline_tag: sentence-similarity
3
  tags:
4
+ - feature-extraction
5
+ - sentence-similarity
6
+ license: mit
7
+ language:
8
+ - fr
9
+ - en
10
  ---
11
 
12
+ # Solon Embeddings large 0.1
13
+
14
+ SOTA Open source french embedding model.
15
+
16
+ | Model | Mean Score |
17
+ | --- | --- |
18
+ | **OrdalieTech/Solon-embeddings-large-0.1** | 0.749 |
19
+ | cohere/embed-multilingual-v3 | 0.7402 |
20
+ | **OrdalieTech/Solon-embeddings-base-0.1** | 0.7306 |
21
+ | openai/ada-002 | 0.7290 |
22
+ | cohere/embed-multilingual-light-v3 | 0.6945 |
23
+ | antoinelouis/biencoder-camembert-base-mmarcoFR | 0.6826 |
24
+ | dangvantuan/sentence-camembert-large | 0.6756 |
25
+ | voyage/voyage-01 | 0.6753 |
26
+ | intfloat/multilingual-e5-large | 0.6660 |
27
+ | intfloat/multilingual-e5-base | 0.6597 |
28
+ | Sbert/paraphrase-multilingual-mpnet-base-v2 | 0.5975 |
29
+ | dangvantuan/sentence-camembert-base | 0.5456 |
30
+ | EuropeanParliament/eubert_embedding_v1 | 0.5063 |
31
+
32
+ These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :
33
+ - AmazonReviewsClassification (MTEB)
34
+ - MassiveIntentClassification (MTEB)
35
+ - MassiveScenarioClassification (MTEB)
36
+ - MTOPDomainClassification (MTEB)
37
+ - MTOPIntentClassification (MTEB)
38
+ - STS22 (MTEB)
39
+ - MiraclFRRerank (Miracl)
40
+ - OrdalieFRSTS (Ordalie)
41
+ - OrdalieFRReranking (Ordalie)
42
+
43
+ We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.
44
+
45
+ (evaluation script available here : github.com/OrdalieTech/mteb)