metadata
pipeline_tag: sentence-similarity
tags:
- feature-extraction
- sentence-similarity
license: mit
language:
- fr
- en
Solon Embeddings — large 0.1
SOTA Open source french embedding model.
Instructions :
Add "query : " before the query to retrieve to increase performance of retrieval.
No instructions needed for passages.
Model | Mean Score |
---|---|
OrdalieTech/Solon-embeddings-large-0.1 | 0.7490 |
cohere/embed-multilingual-v3 | 0.7402 |
OrdalieTech/Solon-embeddings-base-0.1 | 0.7306 |
openai/ada-002 | 0.7290 |
cohere/embed-multilingual-light-v3 | 0.6945 |
antoinelouis/biencoder-camembert-base-mmarcoFR | 0.6826 |
dangvantuan/sentence-camembert-large | 0.6756 |
voyage/voyage-01 | 0.6753 |
intfloat/multilingual-e5-large | 0.6660 |
intfloat/multilingual-e5-base | 0.6597 |
Sbert/paraphrase-multilingual-mpnet-base-v2 | 0.5975 |
dangvantuan/sentence-camembert-base | 0.5456 |
EuropeanParliament/eubert_embedding_v1 | 0.5063 |
These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :
- AmazonReviewsClassification (MTEB)
- MassiveIntentClassification (MTEB)
- MassiveScenarioClassification (MTEB)
- MTOPDomainClassification (MTEB)
- MTOPIntentClassification (MTEB)
- STS22 (MTEB)
- MiraclFRRerank (Miracl)
- OrdalieFRSTS (Ordalie)
- OrdalieFRReranking (Ordalie)
We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.
(evaluation script available here : github.com/OrdalieTech/mteb)