INFO: The model is being continuously updated.

The model is a multilingual-e5-base model fine-tuned with the task of semantic textual similarity in mind.

Model Training

The model has been fine-tuned on the German subsets of the following datasets:

The training procedure can be divided into two stages:

  • training on paraphrase datasets with the Multiple Negatives Ranking Loss
  • training on semantic textual similarity datasets using the Cosine Similarity Loss

Results

The model achieves the following results:

  • 0.920 on stsb's validation subset
  • 0.904 on stsb's test subset
Downloads last month
18,891
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train danielheinz/e5-base-sts-en-de

Spaces using danielheinz/e5-base-sts-en-de 3

Evaluation results