SentenceTransformer based on Snowflake/snowflake-arctic-embed-l-v2.0

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l-v2.0 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-l-v2.0
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LucaZilli/arctic-l-enhanced")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

json

  • Dataset: json
  • Columns: sentence1, sentence2, score, and split
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

json

  • Dataset: json
  • Columns: sentence1, sentence2, score, and split
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • learning_rate: 4.000000000000001e-06
  • max_steps: 9291
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 4.000000000000001e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: 9291
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0011 10 0.1329 -
0.0022 20 0.1211 -
0.0032 30 0.1533 -
0.0043 40 0.1325 -
0.0054 50 0.1076 -
0.0065 60 0.1349 -
0.0075 70 0.1224 -
0.0086 80 0.1062 -
0.0097 90 0.1026 -
0.0108 100 0.0873 -
0.0118 110 0.0733 -
0.0129 120 0.0799 -
0.0140 130 0.0773 -
0.0151 140 0.0666 -
0.0161 150 0.069 0.0615
0.0172 160 0.0639 -
0.0183 170 0.063 -
0.0194 180 0.0739 -
0.0204 190 0.0708 -
0.0215 200 0.0532 -
0.0226 210 0.0573 -
0.0237 220 0.0503 -
0.0248 230 0.0564 -
0.0258 240 0.0592 -
0.0269 250 0.0555 -
0.0280 260 0.0513 -
0.0291 270 0.055 -
0.0301 280 0.0522 -
0.0312 290 0.054 -
0.0323 300 0.0548 0.0531
0.0334 310 0.0495 -
0.0344 320 0.047 -
0.0355 330 0.0551 -
0.0366 340 0.0534 -
0.0377 350 0.0492 -
0.0387 360 0.0584 -
0.0398 370 0.0452 -
0.0409 380 0.0572 -
0.0420 390 0.0423 -
0.0431 400 0.0533 -
0.0441 410 0.0445 -
0.0452 420 0.0513 -
0.0463 430 0.0446 -
0.0474 440 0.0412 -
0.0484 450 0.0456 0.0544
0.0495 460 0.0401 -
0.0506 470 0.0392 -
0.0517 480 0.042 -
0.0527 490 0.0513 -
0.0538 500 0.0368 -
0.0549 510 0.043 -
0.0560 520 0.0418 -
0.0570 530 0.0419 -
0.0581 540 0.0377 -
0.0592 550 0.0354 -
0.0603 560 0.0358 -
0.0613 570 0.0474 -
0.0624 580 0.0384 -
0.0635 590 0.0411 -
0.0646 600 0.0417 0.0558
0.0657 610 0.0389 -
0.0667 620 0.0418 -
0.0678 630 0.0391 -
0.0689 640 0.0354 -
0.0700 650 0.0428 -
0.0710 660 0.0453 -
0.0721 670 0.0333 -
0.0732 680 0.0466 -
0.0743 690 0.0406 -
0.0753 700 0.0378 -
0.0764 710 0.0399 -
0.0775 720 0.036 -
0.0786 730 0.0403 -
0.0796 740 0.0408 -
0.0807 750 0.0335 0.0531
0.0818 760 0.0335 -
0.0829 770 0.0387 -
0.0840 780 0.035 -
0.0850 790 0.0351 -
0.0861 800 0.0407 -
0.0872 810 0.0371 -
0.0883 820 0.0387 -
0.0893 830 0.0365 -
0.0904 840 0.0395 -
0.0915 850 0.0403 -
0.0926 860 0.04 -
0.0936 870 0.0356 -
0.0947 880 0.0333 -
0.0958 890 0.0269 -
0.0969 900 0.0341 0.0455
0.0979 910 0.0294 -
0.0990 920 0.0269 -
0.1001 930 0.0293 -
0.1012 940 0.034 -
0.1022 950 0.0288 -
0.1033 960 0.017 -
0.1044 970 0.0345 -
0.1055 980 0.0331 -
0.1066 990 0.0279 -
0.1076 1000 0.0255 -
0.1087 1010 0.0279 -
0.1098 1020 0.0232 -
0.1109 1030 0.0299 -
0.1119 1040 0.0268 -
0.1130 1050 0.0196 0.0468
0.1141 1060 0.0235 -
0.1152 1070 0.0305 -
0.1162 1080 0.0429 -
0.1173 1090 0.043 -
0.1184 1100 0.0408 -
0.1195 1110 0.0387 -
0.1205 1120 0.0389 -
0.1216 1130 0.0452 -
0.1227 1140 0.0424 -
0.1238 1150 0.0388 -
0.1249 1160 0.0474 -
0.1259 1170 0.0303 -
0.1270 1180 0.0379 -
0.1281 1190 0.033 -
0.1292 1200 0.0303 0.0361
0.1302 1210 0.0361 -
0.1313 1220 0.0366 -
0.1324 1230 0.0359 -
0.1335 1240 0.0304 -
0.1345 1250 0.0265 -
0.1356 1260 0.0286 -
0.1367 1270 0.0326 -
0.1378 1280 0.0324 -
0.1388 1290 0.0304 -
0.1399 1300 0.0328 -
0.1410 1310 0.0339 -
0.1421 1320 0.0362 -
0.1431 1330 0.0318 -
0.1442 1340 0.0291 -
0.1453 1350 0.0241 0.0345
0.1464 1360 0.0233 -
0.1475 1370 0.029 -
0.1485 1380 0.0224 -
0.1496 1390 0.0364 -
0.1507 1400 0.033 -
0.1518 1410 0.0337 -
0.1528 1420 0.0328 -
0.1539 1430 0.0253 -
0.1550 1440 0.028 -
0.1561 1450 0.023 -
0.1571 1460 0.034 -
0.1582 1470 0.0296 -
0.1593 1480 0.0278 -
0.1604 1490 0.0357 -
0.1614 1500 0.0267 0.0357
0.1625 1510 0.0372 -
0.1636 1520 0.0264 -
0.1647 1530 0.0239 -
0.1658 1540 0.0307 -
0.1668 1550 0.0288 -
0.1679 1560 0.0275 -
0.1690 1570 0.0228 -
0.1701 1580 0.0219 -
0.1711 1590 0.0243 -
0.1722 1600 0.0191 -
0.1733 1610 0.018 -
0.1744 1620 0.0226 -
0.1754 1630 0.0261 -
0.1765 1640 0.0248 -
0.1776 1650 0.0199 0.0359
0.1787 1660 0.0309 -
0.1797 1670 0.0213 -
0.1808 1680 0.0221 -
0.1819 1690 0.0257 -
0.1830 1700 0.0219 -
0.1840 1710 0.0294 -
0.1851 1720 0.021 -
0.1862 1730 0.0215 -
0.1873 1740 0.0187 -
0.1884 1750 0.021 -
0.1894 1760 0.02 -
0.1905 1770 0.0208 -
0.1916 1780 0.0184 -
0.1927 1790 0.0182 -
0.1937 1800 0.0158 0.0398
0.1948 1810 0.0191 -
0.1959 1820 0.0256 -
0.1970 1830 0.0199 -
0.1980 1840 0.0163 -
0.1991 1850 0.0241 -
0.2002 1860 0.0153 -
0.2013 1870 0.0198 -
0.2023 1880 0.0177 -
0.2034 1890 0.0172 -
0.2045 1900 0.0154 -
0.2056 1910 0.0213 -
0.2067 1920 0.0159 -
0.2077 1930 0.0227 -
0.2088 1940 0.0149 -
0.2099 1950 0.0198 0.0423
0.2110 1960 0.0178 -
0.2120 1970 0.0153 -
0.2131 1980 0.0163 -
0.2142 1990 0.0161 -
0.2153 2000 0.014 -
0.2163 2010 0.0143 -
0.2174 2020 0.0188 -
0.2185 2030 0.0159 -
0.2196 2040 0.0189 -
0.2206 2050 0.02 -
0.2217 2060 0.0152 -
0.2228 2070 0.0227 -
0.2239 2080 0.0194 -
0.2249 2090 0.0156 -
0.2260 2100 0.0159 0.0449
0.2271 2110 0.0156 -
0.2282 2120 0.0152 -
0.2293 2130 0.016 -
0.2303 2140 0.0124 -
0.2314 2150 0.0157 -
0.2325 2160 0.0217 -
0.2336 2170 0.0146 -
0.2346 2180 0.015 -
0.2357 2190 0.0139 -
0.2368 2200 0.0139 -
0.2379 2210 0.0181 -
0.2389 2220 0.0196 -
0.2400 2230 0.0163 -
0.2411 2240 0.014 -
0.2422 2250 0.015 0.0469
0.2432 2260 0.0156 -
0.2443 2270 0.0172 -
0.2454 2280 0.016 -
0.2465 2290 0.015 -
0.2476 2300 0.0171 -
0.2486 2310 0.0151 -
0.2497 2320 0.0147 -
0.2508 2330 0.0197 -
0.2519 2340 0.0153 -
0.2529 2350 0.0145 -
0.2540 2360 0.0143 -
0.2551 2370 0.0122 -
0.2562 2380 0.0151 -
0.2572 2390 0.0143 -
0.2583 2400 0.0136 0.0502
0.2594 2410 0.0137 -
0.2605 2420 0.0143 -
0.2615 2430 0.0153 -
0.2626 2440 0.019 -
0.2637 2450 0.0125 -
0.2648 2460 0.0146 -
0.2658 2470 0.0154 -
0.2669 2480 0.0158 -
0.2680 2490 0.0129 -
0.2691 2500 0.0131 -
0.2702 2510 0.0217 -
0.2712 2520 0.0132 -
0.2723 2530 0.0133 -
0.2734 2540 0.0146 -
0.2745 2550 0.0152 0.0555
0.2755 2560 0.014 -
0.2766 2570 0.0174 -
0.2777 2580 0.0161 -
0.2788 2590 0.0145 -
0.2798 2600 0.0193 -
0.2809 2610 0.0145 -
0.2820 2620 0.0146 -
0.2831 2630 0.0129 -
0.2841 2640 0.0158 -
0.2852 2650 0.0165 -
0.2863 2660 0.0135 -
0.2874 2670 0.0163 -
0.2885 2680 0.0159 -
0.2895 2690 0.0146 -
0.2906 2700 0.0186 0.0531
0.2917 2710 0.0161 -
0.2928 2720 0.0149 -
0.2938 2730 0.0147 -
0.2949 2740 0.0128 -
0.2960 2750 0.0198 -
0.2971 2760 0.0123 -
0.2981 2770 0.0133 -
0.2992 2780 0.0146 -
0.3003 2790 0.0133 -
0.3014 2800 0.0158 -
0.3024 2810 0.0125 -
0.3035 2820 0.0122 -
0.3046 2830 0.0129 -
0.3057 2840 0.0132 -
0.3067 2850 0.0138 0.0472
0.3078 2860 0.0134 -
0.3089 2870 0.0142 -
0.3100 2880 0.0141 -
0.3111 2890 0.019 -
0.3121 2900 0.0127 -
0.3132 2910 0.0117 -
0.3143 2920 0.0166 -
0.3154 2930 0.0365 -
0.3164 2940 0.0328 -
0.3175 2950 0.0344 -
0.3186 2960 0.0345 -
0.3197 2970 0.0312 -
0.3207 2980 0.017 -
0.3218 2990 0.0176 -
0.3229 3000 0.0145 0.0400
0.3240 3010 0.0116 -
0.3250 3020 0.018 -
0.3261 3030 0.017 -
0.3272 3040 0.0114 -
0.3283 3050 0.0124 -
0.3294 3060 0.012 -
0.3304 3070 0.0118 -
0.3315 3080 0.01 -
0.3326 3090 0.0147 -
1.0002 3100 0.0212 -
1.0013 3110 0.0488 -
1.0024 3120 0.0495 -
1.0034 3130 0.0384 -
1.0045 3140 0.0422 -
1.0056 3150 0.0326 0.0453
1.0067 3160 0.0375 -
1.0077 3170 0.0397 -
1.0088 3180 0.0469 -
1.0099 3190 0.0462 -
1.0110 3200 0.034 -
1.0121 3210 0.048 -
1.0131 3220 0.0377 -
1.0142 3230 0.0299 -
1.0153 3240 0.0344 -
1.0164 3250 0.04 -
1.0174 3260 0.0399 -
1.0185 3270 0.037 -
1.0196 3280 0.0365 -
1.0207 3290 0.039 -
1.0217 3300 0.0355 0.0462
1.0228 3310 0.0328 -
1.0239 3320 0.0297 -
1.0250 3330 0.031 -
1.0260 3340 0.0387 -
1.0271 3350 0.0297 -
1.0282 3360 0.0355 -
1.0293 3370 0.0399 -
1.0304 3380 0.0321 -
1.0314 3390 0.0265 -
1.0325 3400 0.0345 -
1.0336 3410 0.0276 -
1.0347 3420 0.036 -
1.0357 3430 0.0295 -
1.0368 3440 0.036 -
1.0379 3450 0.032 0.0434

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.2.2
  • Accelerate: 1.4.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
19
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for LucaZilli/arctic-l-enhanced

Finetuned
(15)
this model