netapy's picture
Update README.md
0bbb2c3
|
raw
history blame
1.61 kB
metadata
pipeline_tag: sentence-similarity
tags:
  - feature-extraction
  - sentence-similarity
license: mit
language:
  - fr
  - en

Solon Embeddings — large 0.1

SOTA Open source french embedding model.

Instructions :
Add "query : " before the query to retrieve to increase performance of retrieval.
No instructions needed for passages.

Model Mean Score
OrdalieTech/Solon-embeddings-large-0.1 0.7490
cohere/embed-multilingual-v3 0.7402
OrdalieTech/Solon-embeddings-base-0.1 0.7306
openai/ada-002 0.7290
cohere/embed-multilingual-light-v3 0.6945
antoinelouis/biencoder-camembert-base-mmarcoFR 0.6826
dangvantuan/sentence-camembert-large 0.6756
voyage/voyage-01 0.6753
intfloat/multilingual-e5-large 0.6660
intfloat/multilingual-e5-base 0.6597
Sbert/paraphrase-multilingual-mpnet-base-v2 0.5975
dangvantuan/sentence-camembert-base 0.5456
EuropeanParliament/eubert_embedding_v1 0.5063

These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :

  • AmazonReviewsClassification (MTEB)
  • MassiveIntentClassification (MTEB)
  • MassiveScenarioClassification (MTEB)
  • MTOPDomainClassification (MTEB)
  • MTOPIntentClassification (MTEB)
  • STS22 (MTEB)
  • MiraclFRRerank (Miracl)
  • OrdalieFRSTS (Ordalie)
  • OrdalieFRReranking (Ordalie)

We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.

(evaluation script available here : github.com/OrdalieTech/mteb)