Edit model card

bge-base-en-v1.5-klej-dyk

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Herkules na rozstajach',
    'jak zinterpretować wymowę obrazu Herkules na rozstajach?',
    'Dowódcą grupy był Wiaczesław Razumowicz ps. „Chmara”.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1731
cosine_accuracy@3 0.4615
cosine_accuracy@5 0.6226
cosine_accuracy@10 0.7356
cosine_precision@1 0.1731
cosine_precision@3 0.1538
cosine_precision@5 0.1245
cosine_precision@10 0.0736
cosine_recall@1 0.1731
cosine_recall@3 0.4615
cosine_recall@5 0.6226
cosine_recall@10 0.7356
cosine_ndcg@10 0.4434
cosine_mrr@10 0.3505
cosine_map@100 0.3574

Information Retrieval

Metric Value
cosine_accuracy@1 0.1683
cosine_accuracy@3 0.4519
cosine_accuracy@5 0.601
cosine_accuracy@10 0.7091
cosine_precision@1 0.1683
cosine_precision@3 0.1506
cosine_precision@5 0.1202
cosine_precision@10 0.0709
cosine_recall@1 0.1683
cosine_recall@3 0.4519
cosine_recall@5 0.601
cosine_recall@10 0.7091
cosine_ndcg@10 0.4296
cosine_mrr@10 0.3406
cosine_map@100 0.3485

Information Retrieval

Metric Value
cosine_accuracy@1 0.1923
cosine_accuracy@3 0.4543
cosine_accuracy@5 0.5913
cosine_accuracy@10 0.6899
cosine_precision@1 0.1923
cosine_precision@3 0.1514
cosine_precision@5 0.1183
cosine_precision@10 0.069
cosine_recall@1 0.1923
cosine_recall@3 0.4543
cosine_recall@5 0.5913
cosine_recall@10 0.6899
cosine_ndcg@10 0.4311
cosine_mrr@10 0.3488
cosine_map@100 0.3561

Information Retrieval

Metric Value
cosine_accuracy@1 0.1635
cosine_accuracy@3 0.4159
cosine_accuracy@5 0.5168
cosine_accuracy@10 0.5986
cosine_precision@1 0.1635
cosine_precision@3 0.1386
cosine_precision@5 0.1034
cosine_precision@10 0.0599
cosine_recall@1 0.1635
cosine_recall@3 0.4159
cosine_recall@5 0.5168
cosine_recall@10 0.5986
cosine_ndcg@10 0.3764
cosine_mrr@10 0.3052
cosine_map@100 0.3152

Information Retrieval

Metric Value
cosine_accuracy@1 0.1659
cosine_accuracy@3 0.351
cosine_accuracy@5 0.4399
cosine_accuracy@10 0.5288
cosine_precision@1 0.1659
cosine_precision@3 0.117
cosine_precision@5 0.088
cosine_precision@10 0.0529
cosine_recall@1 0.1659
cosine_recall@3 0.351
cosine_recall@5 0.4399
cosine_recall@10 0.5288
cosine_ndcg@10 0.3382
cosine_mrr@10 0.278
cosine_map@100 0.2877

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,738 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 90.01 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 30.82 tokens
    • max: 76 tokens
  • Samples:
    positive anchor
    Londyńska premiera w Ambassadors Theatre na londyńskim West Endzie miała miejsce 25 listopada 1952 roku, a przedstawione grane jest do dziś (od 1974 r.) w sąsiednim St Martin's Theatre. W Polsce była wystawiana m.in. w Teatrze Nowym w Zabrzu. w którym londyńskim muzeum wystawiana była instalacja My Bed?
    Theridion grallator osiąga długość 5 mm. U niektórych postaci na żółtym odwłoku występuje wzór przypominający uśmiechniętą lub śmiejącą się twarz klowna. które pająki noszą na grzbiecie wzór przypominający uśmiechniętego klauna?
    W 1998 w wyniku sporów o wytyczenie granicy między dwoma państwami wybuchła wojna erytrejsko-etiopska. Zakończyła się porozumieniem zawartym w Algierze 12 grudnia 2000. Od tego czasu strefa graniczna jest patrolowana przez siły pokojowe ONZ. jakie były skutki wojny erytrejsko-etiopskiej?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0684 1 7.2706 - - - - -
0.1368 2 8.2776 - - - - -
0.2051 3 7.1399 - - - - -
0.2735 4 6.6905 - - - - -
0.3419 5 6.735 - - - - -
0.4103 6 7.0537 - - - - -
0.4786 7 6.871 - - - - -
0.5470 8 6.7277 - - - - -
0.6154 9 5.9853 - - - - -
0.6838 10 6.0518 - - - - -
0.7521 11 5.8291 - - - - -
0.8205 12 5.0064 - - - - -
0.8889 13 4.8572 - - - - -
0.9573 14 5.1899 0.2812 0.3335 0.3486 0.2115 0.3639
1.0256 15 4.2996 - - - - -
1.0940 16 4.1475 - - - - -
1.1624 17 4.6174 - - - - -
1.2308 18 4.394 - - - - -
1.2991 19 4.0255 - - - - -
1.3675 20 3.9722 - - - - -
1.4359 21 3.9509 - - - - -
1.5043 22 3.7674 - - - - -
1.5726 23 3.7572 - - - - -
1.6410 24 3.9463 - - - - -
1.7094 25 3.7151 - - - - -
1.7778 26 3.7771 - - - - -
1.8462 27 3.5228 - - - - -
1.9145 28 2.7906 - - - - -
1.9829 29 3.4555 0.3164 0.3529 0.3641 0.2636 0.3681
2.0513 30 2.737 - - - - -
2.1197 31 3.1976 - - - - -
2.1880 32 3.1363 - - - - -
2.2564 33 2.9706 - - - - -
2.3248 34 2.9629 - - - - -
2.3932 35 2.7226 - - - - -
2.4615 36 2.4378 - - - - -
2.5299 37 2.7201 - - - - -
2.5983 38 2.6802 - - - - -
2.6667 39 3.1613 - - - - -
2.7350 40 2.9344 - - - - -
2.8034 41 2.5254 - - - - -
2.8718 42 2.5617 - - - - -
2.9402 43 2.459 0.3197 0.3571 0.3640 0.2739 0.3733
3.0085 44 2.3785 - - - - -
3.0769 45 1.9408 - - - - -
3.1453 46 2.7095 - - - - -
3.2137 47 2.4774 - - - - -
3.2821 48 2.2178 - - - - -
3.3504 49 2.0884 - - - - -
3.4188 50 2.1044 - - - - -
3.4872 51 2.1504 - - - - -
3.5556 52 2.1177 - - - - -
3.6239 53 2.2283 - - - - -
3.6923 54 2.3964 - - - - -
3.7607 55 2.0972 - - - - -
3.8291 56 2.0961 - - - - -
3.8974 57 1.783 - - - - -
3.9658 58 2.1031 0.3246 0.3533 0.3603 0.2829 0.3687
4.0342 59 1.6699 - - - - -
4.1026 60 1.6675 - - - - -
4.1709 61 2.1672 - - - - -
4.2393 62 1.8881 - - - - -
4.3077 63 1.701 - - - - -
4.3761 64 1.9154 - - - - -
4.4444 65 1.4549 - - - - -
4.5128 66 1.5444 - - - - -
4.5812 67 1.8352 - - - - -
4.6496 68 1.7908 - - - - -
4.7179 69 1.6876 - - - - -
4.7863 70 1.7366 - - - - -
4.8547 71 1.8689 - - - - -
4.9231 72 1.4676 - - - - -
4.9915 73 1.5045 0.3170 0.3538 0.3606 0.2829 0.3675
5.0598 74 1.2155 - - - - -
5.1282 75 1.4365 - - - - -
5.1966 76 1.7451 - - - - -
5.2650 77 1.4537 - - - - -
5.3333 78 1.3813 - - - - -
5.4017 79 1.4035 - - - - -
5.4701 80 1.3912 - - - - -
5.5385 81 1.3286 - - - - -
5.6068 82 1.5153 - - - - -
5.6752 83 1.6745 - - - - -
5.7436 84 1.4323 - - - - -
5.8120 85 1.5299 - - - - -
5.8803 86 1.488 - - - - -
5.9487 87 1.5195 0.3206 0.3556 0.3530 0.2878 0.3605
6.0171 88 1.2999 - - - - -
6.0855 89 1.1511 - - - - -
6.1538 90 1.552 - - - - -
6.2222 91 1.35 - - - - -
6.2906 92 1.218 - - - - -
6.3590 93 1.1712 - - - - -
6.4274 94 1.3381 - - - - -
6.4957 95 1.1716 - - - - -
6.5641 96 1.2117 - - - - -
6.6325 97 1.5349 - - - - -
6.7009 98 1.4564 - - - - -
6.7692 99 1.3541 - - - - -
6.8376 100 1.2468 - - - - -
6.9060 101 1.1519 - - - - -
6.9744 102 1.2421 0.3150 0.3555 0.3501 0.2858 0.3575
7.0427 103 1.0096 - - - - -
7.1111 104 1.1405 - - - - -
7.1795 105 1.2958 - - - - -
7.2479 106 1.35 - - - - -
7.3162 107 1.1291 - - - - -
7.3846 108 0.9968 - - - - -
7.4530 109 1.0454 - - - - -
7.5214 110 1.102 - - - - -
7.5897 111 1.1328 - - - - -
7.6581 112 1.5988 - - - - -
7.7265 113 1.2992 - - - - -
7.7949 114 1.2572 - - - - -
7.8632 115 1.1414 - - - - -
7.9316 116 1.1432 - - - - -
8.0 117 1.1181 0.3154 0.3545 0.3509 0.2884 0.3578
8.0684 118 0.9365 - - - - -
8.1368 119 1.3286 - - - - -
8.2051 120 1.3711 - - - - -
8.2735 121 1.2001 - - - - -
8.3419 122 1.165 - - - - -
8.4103 123 1.0575 - - - - -
8.4786 124 1.105 - - - - -
8.5470 125 1.077 - - - - -
8.6154 126 1.2217 - - - - -
8.6838 127 1.3254 - - - - -
8.7521 128 1.2165 - - - - -
8.8205 129 1.3021 - - - - -
8.8889 130 1.0927 - - - - -
8.9573 131 1.3961 0.3150 0.3540 0.3490 0.2882 0.3588
9.0256 132 1.0779 - - - - -
9.0940 133 0.901 - - - - -
9.1624 134 1.313 - - - - -
9.2308 135 1.1409 - - - - -
9.2991 136 1.1635 - - - - -
9.3675 137 1.0244 - - - - -
9.4359 138 1.0576 - - - - -
9.5043 139 1.0101 - - - - -
9.5726 140 1.1516 0.3152 0.3561 0.3485 0.2877 0.3574
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.3.1
  • Accelerate: 0.27.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
·

Finetuned from

Evaluation results