SentenceTransformer based on intfloat/multilingual-e5-base

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-base on the rztk/rozetka_positive_pairs dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'поилка для детей',
    "<category>Поїльники та непроливайки</category><brand>Nuk</brand><options><option_title>Стать дитини</option_title><option_value>Хлопчик</option_value><option_title>Стать дитини</option_title><option_value>Дівчинка</option_value><option_title>Кількість вантажних місць</option_title><option_value>1</option_value><option_title>Країна реєстрації бренда</option_title><option_value>Німеччина</option_value><option_title>Країна-виробник товару</option_title><option_value>Німеччина</option_value><option_title>Об'єм, мл</option_title><option_value>300</option_value><option_title>Матеріал</option_title><option_value>Пластик</option_value><option_title>Колір</option_title><option_value>Блакитний</option_value><option_title>Тип</option_title><option_value>Поїльник</option_value><option_title>Тип гарантійного талона</option_title><option_value>Гарантія по чеку</option_value><option_title>Доставка Premium</option_title></options>",
    'Шафа розпашній Fenster Оксфорд Лагуна',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
dot_accuracy@1 0.5429
dot_accuracy@3 0.6889
dot_accuracy@5 0.7492
dot_accuracy@10 0.8
dot_precision@1 0.5429
dot_precision@3 0.5217
dot_precision@5 0.5035
dot_precision@10 0.4768
dot_recall@1 0.0092
dot_recall@3 0.0238
dot_recall@5 0.0351
dot_recall@10 0.0599
dot_ndcg@10 0.4937
dot_mrr@10 0.6287
dot_map@100 0.1404

Information Retrieval

Metric Value
dot_accuracy@1 0.1619
dot_precision@1 0.1619
dot_recall@1 0.002
dot_ndcg@1 0.1619
dot_mrr@1 0.1619
dot_map@100 0.0213

Information Retrieval

Metric Value
dot_accuracy@1 0.146
dot_precision@1 0.146
dot_recall@1 0.0017
dot_ndcg@1 0.146
dot_mrr@1 0.146
dot_map@100 0.0152

Information Retrieval

Metric Value
dot_accuracy@1 0.1016
dot_precision@1 0.1016
dot_recall@1 0.0013
dot_ndcg@1 0.1016
dot_mrr@1 0.1016
dot_map@100 0.012

Information Retrieval

Metric Value
dot_accuracy@1 0.054
dot_precision@1 0.054
dot_recall@1 0.0007
dot_ndcg@1 0.054
dot_mrr@1 0.054
dot_map@100 0.0054

Training Details

Training Dataset

rztk/rozetka_positive_pairs

  • Dataset: rztk/rozetka_positive_pairs
  • Size: 44,800 training samples
  • Columns: query and text
  • Approximate statistics based on the first 1000 samples:
    query text
    type string string
    details
    • min: 3 tokens
    • mean: 7.18 tokens
    • max: 16 tokens
    • min: 9 tokens
    • mean: 158.88 tokens
    • max: 512 tokens
  • Samples:
    query text
    p smart z TPU чехол Ultrathin Series 0,33 mm для Huawei P Smart Z Безбарвний (прозорий)
    p smart z Чохли для мобільних телефонівМатеріалСиліконКолірTransparentСумісна модельP Smart Z
    p smart z TPU чехол Ultrathin Series 0,33mm для Huawei P Smart Z Бесцветный (прозрачный)
  • Loss: sentence_transformers_training.model.matryoshka2d_loss.RZTKMatryoshka2dLoss with these parameters:
    {
        "loss": "RZTKMultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Evaluation Dataset

rztk/rozetka_positive_pairs

  • Dataset: rztk/rozetka_positive_pairs
  • Size: 4,480 evaluation samples
  • Columns: query and text
  • Approximate statistics based on the first 1000 samples:
    query text
    type string string
    details
    • min: 3 tokens
    • mean: 6.29 tokens
    • max: 11 tokens
    • min: 12 tokens
    • mean: 161.36 tokens
    • max: 512 tokens
  • Samples:
    query text
    кошелек женский Портмоне BAELLERRY Forever N2345 Черный (020354)
    кошелек женский ГаманціBaellerryДля когоДля жінокВидПортмонеМатеріалШтучна шкіраКраїна-виробник товаруКитай
    кошелек женский Портмоне BAELLERRY Forever N2345 Черный (020354)
  • Loss: sentence_transformers_training.model.matryoshka2d_loss.RZTKMatryoshka2dLoss with these parameters:
    {
        "loss": "RZTKMultipleNegativesRankingLoss",
        "n_layers_per_step": 1,
        "last_layer_weight": 1.0,
        "prior_layers_weight": 1.0,
        "kl_div_weight": 1.0,
        "kl_temperature": 0.3,
        "matryoshka_dims": [
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": 1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 112
  • per_device_eval_batch_size: 112
  • torch_empty_cache_steps: 30
  • learning_rate: 2e-05
  • num_train_epochs: 1.0
  • warmup_ratio: 0.1
  • bf16: True
  • bf16_full_eval: True
  • tf32: True
  • dataloader_num_workers: 2
  • load_best_model_at_end: True
  • optim: adafactor
  • push_to_hub: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 112
  • per_device_eval_batch_size: 112
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: 30
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: True
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 2
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adafactor
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • ddp_static_graph: False
  • ddp_comm_hook: bf16
  • gradient_as_bucket_view: False

Training Logs

Epoch Step Training Loss loss rusisms-uk-title--matryoshka_dim-128--_dot_map@100 rusisms-uk-title--matryoshka_dim-256--_dot_map@100 rusisms-uk-title--matryoshka_dim-512--_dot_map@100 rusisms-uk-title--matryoshka_dim-768--_dot_map@100 rusisms-uk-title_dot_map@100
0.1 10 6.6103 - - - - - -
0.2 20 5.524 - - - - - -
0.3 30 4.759 3.6444 - - - - -
0.4 40 4.5195 - - - - - -
0.5 50 3.6598 - - - - - -
0.6 60 3.7912 2.8962 - - - - -
0.7 70 3.9935 - - - - - -
0.8 80 3.3929 - - - - - -
0.9 90 3.6101 2.6889 - - - - -
1.0 100 3.8753 - 0.0054 0.0120 0.0152 0.0213 0.1404
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 3.0.1
  • Transformers: 4.45.1
  • PyTorch: 2.4.1
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
25
Safetensors
Model size
278M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for rztk-bohdanbilonoh/multilingual-e5-base-test

Finetuned
(35)
this model

Evaluation results