SentenceTransformer based on Snowflake/snowflake-arctic-embed-l-v2.0

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-l-v2.0 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Snowflake/snowflake-arctic-embed-l-v2.0
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- json

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("LucaZilli/arctic-l-enhanced")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

json

Dataset: json
Columns: sentence1, sentence2, score, and split

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Evaluation Dataset

json

Dataset: json
Columns: sentence1, sentence2, score, and split

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 12
per_device_eval_batch_size: 12
learning_rate: 4.000000000000001e-06
max_steps: 9291
warmup_ratio: 0.1
fp16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 12
per_device_eval_batch_size: 12
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 4.000000000000001e-06
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: 9291
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss
0.0011	10	0.1329	-
0.0022	20	0.1211	-
0.0032	30	0.1533	-
0.0043	40	0.1325	-
0.0054	50	0.1076	-
0.0065	60	0.1349	-
0.0075	70	0.1224	-
0.0086	80	0.1062	-
0.0097	90	0.1026	-
0.0108	100	0.0873	-
0.0118	110	0.0733	-
0.0129	120	0.0799	-
0.0140	130	0.0773	-
0.0151	140	0.0666	-
0.0161	150	0.069	0.0615
0.0172	160	0.0639	-
0.0183	170	0.063	-
0.0194	180	0.0739	-
0.0204	190	0.0708	-
0.0215	200	0.0532	-
0.0226	210	0.0573	-
0.0237	220	0.0503	-
0.0248	230	0.0564	-
0.0258	240	0.0592	-
0.0269	250	0.0555	-
0.0280	260	0.0513	-
0.0291	270	0.055	-
0.0301	280	0.0522	-
0.0312	290	0.054	-
0.0323	300	0.0548	0.0531
0.0334	310	0.0495	-
0.0344	320	0.047	-
0.0355	330	0.0551	-
0.0366	340	0.0534	-
0.0377	350	0.0492	-
0.0387	360	0.0584	-
0.0398	370	0.0452	-
0.0409	380	0.0572	-
0.0420	390	0.0423	-
0.0431	400	0.0533	-
0.0441	410	0.0445	-
0.0452	420	0.0513	-
0.0463	430	0.0446	-
0.0474	440	0.0412	-
0.0484	450	0.0456	0.0544
0.0495	460	0.0401	-
0.0506	470	0.0392	-
0.0517	480	0.042	-
0.0527	490	0.0513	-
0.0538	500	0.0368	-
0.0549	510	0.043	-
0.0560	520	0.0418	-
0.0570	530	0.0419	-
0.0581	540	0.0377	-
0.0592	550	0.0354	-
0.0603	560	0.0358	-
0.0613	570	0.0474	-
0.0624	580	0.0384	-
0.0635	590	0.0411	-
0.0646	600	0.0417	0.0558
0.0657	610	0.0389	-
0.0667	620	0.0418	-
0.0678	630	0.0391	-
0.0689	640	0.0354	-
0.0700	650	0.0428	-
0.0710	660	0.0453	-
0.0721	670	0.0333	-
0.0732	680	0.0466	-
0.0743	690	0.0406	-
0.0753	700	0.0378	-
0.0764	710	0.0399	-
0.0775	720	0.036	-
0.0786	730	0.0403	-
0.0796	740	0.0408	-
0.0807	750	0.0335	0.0531
0.0818	760	0.0335	-
0.0829	770	0.0387	-
0.0840	780	0.035	-
0.0850	790	0.0351	-
0.0861	800	0.0407	-
0.0872	810	0.0371	-
0.0883	820	0.0387	-
0.0893	830	0.0365	-
0.0904	840	0.0395	-
0.0915	850	0.0403	-
0.0926	860	0.04	-
0.0936	870	0.0356	-
0.0947	880	0.0333	-
0.0958	890	0.0269	-
0.0969	900	0.0341	0.0455
0.0979	910	0.0294	-
0.0990	920	0.0269	-
0.1001	930	0.0293	-
0.1012	940	0.034	-
0.1022	950	0.0288	-
0.1033	960	0.017	-
0.1044	970	0.0345	-
0.1055	980	0.0331	-
0.1066	990	0.0279	-
0.1076	1000	0.0255	-
0.1087	1010	0.0279	-
0.1098	1020	0.0232	-
0.1109	1030	0.0299	-
0.1119	1040	0.0268	-
0.1130	1050	0.0196	0.0468
0.1141	1060	0.0235	-
0.1152	1070	0.0305	-
0.1162	1080	0.0429	-
0.1173	1090	0.043	-
0.1184	1100	0.0408	-
0.1195	1110	0.0387	-
0.1205	1120	0.0389	-
0.1216	1130	0.0452	-
0.1227	1140	0.0424	-
0.1238	1150	0.0388	-
0.1249	1160	0.0474	-
0.1259	1170	0.0303	-
0.1270	1180	0.0379	-
0.1281	1190	0.033	-
0.1292	1200	0.0303	0.0361
0.1302	1210	0.0361	-
0.1313	1220	0.0366	-
0.1324	1230	0.0359	-
0.1335	1240	0.0304	-
0.1345	1250	0.0265	-
0.1356	1260	0.0286	-
0.1367	1270	0.0326	-
0.1378	1280	0.0324	-
0.1388	1290	0.0304	-
0.1399	1300	0.0328	-
0.1410	1310	0.0339	-
0.1421	1320	0.0362	-
0.1431	1330	0.0318	-
0.1442	1340	0.0291	-
0.1453	1350	0.0241	0.0345
0.1464	1360	0.0233	-
0.1475	1370	0.029	-
0.1485	1380	0.0224	-
0.1496	1390	0.0364	-
0.1507	1400	0.033	-
0.1518	1410	0.0337	-
0.1528	1420	0.0328	-
0.1539	1430	0.0253	-
0.1550	1440	0.028	-
0.1561	1450	0.023	-
0.1571	1460	0.034	-
0.1582	1470	0.0296	-
0.1593	1480	0.0278	-
0.1604	1490	0.0357	-
0.1614	1500	0.0267	0.0357
0.1625	1510	0.0372	-
0.1636	1520	0.0264	-
0.1647	1530	0.0239	-
0.1658	1540	0.0307	-
0.1668	1550	0.0288	-
0.1679	1560	0.0275	-
0.1690	1570	0.0228	-
0.1701	1580	0.0219	-
0.1711	1590	0.0243	-
0.1722	1600	0.0191	-
0.1733	1610	0.018	-
0.1744	1620	0.0226	-
0.1754	1630	0.0261	-
0.1765	1640	0.0248	-
0.1776	1650	0.0199	0.0359
0.1787	1660	0.0309	-
0.1797	1670	0.0213	-
0.1808	1680	0.0221	-
0.1819	1690	0.0257	-
0.1830	1700	0.0219	-
0.1840	1710	0.0294	-
0.1851	1720	0.021	-
0.1862	1730	0.0215	-
0.1873	1740	0.0187	-
0.1884	1750	0.021	-
0.1894	1760	0.02	-
0.1905	1770	0.0208	-
0.1916	1780	0.0184	-
0.1927	1790	0.0182	-
0.1937	1800	0.0158	0.0398
0.1948	1810	0.0191	-
0.1959	1820	0.0256	-
0.1970	1830	0.0199	-
0.1980	1840	0.0163	-
0.1991	1850	0.0241	-
0.2002	1860	0.0153	-
0.2013	1870	0.0198	-
0.2023	1880	0.0177	-
0.2034	1890	0.0172	-
0.2045	1900	0.0154	-
0.2056	1910	0.0213	-
0.2067	1920	0.0159	-
0.2077	1930	0.0227	-
0.2088	1940	0.0149	-
0.2099	1950	0.0198	0.0423
0.2110	1960	0.0178	-
0.2120	1970	0.0153	-
0.2131	1980	0.0163	-
0.2142	1990	0.0161	-
0.2153	2000	0.014	-
0.2163	2010	0.0143	-
0.2174	2020	0.0188	-
0.2185	2030	0.0159	-
0.2196	2040	0.0189	-
0.2206	2050	0.02	-
0.2217	2060	0.0152	-
0.2228	2070	0.0227	-
0.2239	2080	0.0194	-
0.2249	2090	0.0156	-
0.2260	2100	0.0159	0.0449
0.2271	2110	0.0156	-
0.2282	2120	0.0152	-
0.2293	2130	0.016	-
0.2303	2140	0.0124	-
0.2314	2150	0.0157	-
0.2325	2160	0.0217	-
0.2336	2170	0.0146	-
0.2346	2180	0.015	-
0.2357	2190	0.0139	-
0.2368	2200	0.0139	-
0.2379	2210	0.0181	-
0.2389	2220	0.0196	-
0.2400	2230	0.0163	-
0.2411	2240	0.014	-
0.2422	2250	0.015	0.0469
0.2432	2260	0.0156	-
0.2443	2270	0.0172	-
0.2454	2280	0.016	-
0.2465	2290	0.015	-
0.2476	2300	0.0171	-
0.2486	2310	0.0151	-
0.2497	2320	0.0147	-
0.2508	2330	0.0197	-
0.2519	2340	0.0153	-
0.2529	2350	0.0145	-
0.2540	2360	0.0143	-
0.2551	2370	0.0122	-
0.2562	2380	0.0151	-
0.2572	2390	0.0143	-
0.2583	2400	0.0136	0.0502
0.2594	2410	0.0137	-
0.2605	2420	0.0143	-
0.2615	2430	0.0153	-
0.2626	2440	0.019	-
0.2637	2450	0.0125	-
0.2648	2460	0.0146	-
0.2658	2470	0.0154	-
0.2669	2480	0.0158	-
0.2680	2490	0.0129	-
0.2691	2500	0.0131	-
0.2702	2510	0.0217	-
0.2712	2520	0.0132	-
0.2723	2530	0.0133	-
0.2734	2540	0.0146	-
0.2745	2550	0.0152	0.0555
0.2755	2560	0.014	-
0.2766	2570	0.0174	-
0.2777	2580	0.0161	-
0.2788	2590	0.0145	-
0.2798	2600	0.0193	-
0.2809	2610	0.0145	-
0.2820	2620	0.0146	-
0.2831	2630	0.0129	-
0.2841	2640	0.0158	-
0.2852	2650	0.0165	-
0.2863	2660	0.0135	-
0.2874	2670	0.0163	-
0.2885	2680	0.0159	-
0.2895	2690	0.0146	-
0.2906	2700	0.0186	0.0531
0.2917	2710	0.0161	-
0.2928	2720	0.0149	-
0.2938	2730	0.0147	-
0.2949	2740	0.0128	-
0.2960	2750	0.0198	-
0.2971	2760	0.0123	-
0.2981	2770	0.0133	-
0.2992	2780	0.0146	-
0.3003	2790	0.0133	-
0.3014	2800	0.0158	-
0.3024	2810	0.0125	-
0.3035	2820	0.0122	-
0.3046	2830	0.0129	-
0.3057	2840	0.0132	-
0.3067	2850	0.0138	0.0472
0.3078	2860	0.0134	-
0.3089	2870	0.0142	-
0.3100	2880	0.0141	-
0.3111	2890	0.019	-
0.3121	2900	0.0127	-
0.3132	2910	0.0117	-
0.3143	2920	0.0166	-
0.3154	2930	0.0365	-
0.3164	2940	0.0328	-
0.3175	2950	0.0344	-
0.3186	2960	0.0345	-
0.3197	2970	0.0312	-
0.3207	2980	0.017	-
0.3218	2990	0.0176	-
0.3229	3000	0.0145	0.0400
0.3240	3010	0.0116	-
0.3250	3020	0.018	-
0.3261	3030	0.017	-
0.3272	3040	0.0114	-
0.3283	3050	0.0124	-
0.3294	3060	0.012	-
0.3304	3070	0.0118	-
0.3315	3080	0.01	-
0.3326	3090	0.0147	-
1.0002	3100	0.0212	-
1.0013	3110	0.0488	-
1.0024	3120	0.0495	-
1.0034	3130	0.0384	-
1.0045	3140	0.0422	-
1.0056	3150	0.0326	0.0453
1.0067	3160	0.0375	-
1.0077	3170	0.0397	-
1.0088	3180	0.0469	-
1.0099	3190	0.0462	-
1.0110	3200	0.034	-
1.0121	3210	0.048	-
1.0131	3220	0.0377	-
1.0142	3230	0.0299	-
1.0153	3240	0.0344	-
1.0164	3250	0.04	-
1.0174	3260	0.0399	-
1.0185	3270	0.037	-
1.0196	3280	0.0365	-
1.0207	3290	0.039	-
1.0217	3300	0.0355	0.0462
1.0228	3310	0.0328	-
1.0239	3320	0.0297	-
1.0250	3330	0.031	-
1.0260	3340	0.0387	-
1.0271	3350	0.0297	-
1.0282	3360	0.0355	-
1.0293	3370	0.0399	-
1.0304	3380	0.0321	-
1.0314	3390	0.0265	-
1.0325	3400	0.0345	-
1.0336	3410	0.0276	-
1.0347	3420	0.036	-
1.0357	3430	0.0295	-
1.0368	3440	0.036	-
1.0379	3450	0.032	0.0434

Framework Versions

Python: 3.10.14
Sentence Transformers: 3.4.1
Transformers: 4.49.0
PyTorch: 2.2.2
Accelerate: 1.4.0
Datasets: 3.3.2
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

LucaZilli
/

arctic-l-enhanced

SentenceTransformer based on Snowflake/snowflake-arctic-embed-l-v2.0

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Dataset

json

Evaluation Dataset

json

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

Model tree for LucaZilli/arctic-l-enhanced