Edit model card

Solon Embeddings — large 0.1

SOTA Open source french embedding model.

Instructions :
Add "query : " before the query to retrieve to increase performance of retrieval.
No instructions needed for passages.

Model Mean Score
OrdalieTech/Solon-embeddings-large-0.1 0.7490
cohere/embed-multilingual-v3 0.7402
OrdalieTech/Solon-embeddings-base-0.1 0.7306
openai/ada-002 0.7290
cohere/embed-multilingual-light-v3 0.6945
antoinelouis/biencoder-camembert-base-mmarcoFR 0.6826
dangvantuan/sentence-camembert-large 0.6756
voyage/voyage-01 0.6753
intfloat/multilingual-e5-large 0.6660
intfloat/multilingual-e5-base 0.6597
Sbert/paraphrase-multilingual-mpnet-base-v2 0.5975
dangvantuan/sentence-camembert-base 0.5456
EuropeanParliament/eubert_embedding_v1 0.5063

These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :

  • AmazonReviewsClassification (MTEB)
  • MassiveIntentClassification (MTEB)
  • MassiveScenarioClassification (MTEB)
  • MTOPDomainClassification (MTEB)
  • MTOPIntentClassification (MTEB)
  • STS22 (MTEB)
  • MiraclFRRerank (Miracl)
  • OrdalieFRSTS (Ordalie)
  • OrdalieFRReranking (Ordalie)

We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.

(evaluation script available here : github.com/OrdalieTech/mteb)

Downloads last month
38,309
Safetensors
Model size
560M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using OrdalieTech/Solon-embeddings-large-0.1 4

Evaluation results