SentenceTransformer based on Qwen/Qwen2-1.5B-instruct

This is a sentence-transformers model finetuned from Qwen/Qwen2-1.5B-instruct. It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Qwen/Qwen2-1.5B-instruct
Maximum Sequence Length: 32768 tokens
Output Dimensionality: 1536 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("asbabiy/crm-mail-embedder-cosent")
# Run inference
sentences = [
    'Mail Queue: ratehawk-b2b\nMail From: aa3b09f5a33cf090e29667bf72936a77@travelclub.ae\nMail To: support@ratehawk.com\n\nMail Subject: Ticket Closed - URGENT : Reconfirmation & HCN for ATS160057 : 139201464/Check-in date - 12 Mar 2024\n\nMail Body:\n"""\nDear Support,  Your ticket - URGENT : Reconfirmation & HCN for ATS160057 : 139201464/Check-in date - 12 Mar 2024 -  has been closed.  We hope that the ticket was resolved to your satisfaction. If you feel that the ticket should not be closed or if the ticket has not been resolved, please reply to this email.  Sincerely, Travelclub Support Team https://blue7tech-help.freshdesk.com/helpdesk/tickets/63824\n"""',
    "Email category: 'TPP -- Auto template'. Email category description: 'This is an automated email from the supplier acknowledging receipt of a previous communication or providing a status update on a pending request without any specific update on the request. It solely includes a phrase indicating that the request has been acknowledged. Such emails may contain messages such as: information that the request has been taken or in process; that the ticket for the request has been created; that it is a holiday and the office hours have changed; that the company's working hours have been adjusted; that a number has been assigned to the request and updates will be provided once available; that the information has been received and transffered to the guest or hotel; or that they will contact us shortly. Also this can be message from any of our supplier stating that our account recently attempted to log in from New Browser. The purpose of this email is to let you know that your message has been received and is being handled.Email lacks personalized details specific to the recipient's situation or references to a unique order or request, which may indicate it is a generic automated response. Auto-emails are often rich with html formatting, tabular data and have a lot of tags or links.'",
    "Email category: 'TPP -- Additional request of arrival time'. Email category description: 'A request from the supplier asking for the client to provide the exact or approximate check-in/arrival time as this is requested by the hotel due to different reasons. For example, the hotel does not have 24 hour reception and for this reason is asking for the arrival time. Information about the check-in helps the hotel better prepare for the guest's arrival and plan the schedule of the hotel staff.'",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1536]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
gradient_accumulation_steps: 16
learning_rate: 1e-05
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 16
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss
0.0031	5	1.8139	-
0.0062	10	1.699	-
0.0093	15	1.6467	-
0.0124	20	1.7853	-
0.0155	25	1.7918	-
0.0186	30	1.9042	-
0.0217	35	1.7087	-
0.0248	40	1.7143	-
0.0279	45	1.7357	-
0.0310	50	1.5956	1.6129
0.0341	55	1.7191	-
0.0372	60	1.5434	-
0.0403	65	1.6527	-
0.0434	70	1.6267	-
0.0465	75	1.5512	-
0.0497	80	1.4611	-
0.0528	85	1.49	-
0.0559	90	1.4336	-
0.0590	95	1.3646	-
0.0621	100	1.5523	1.4122
0.0652	105	1.4359	-
0.0683	110	1.4459	-
0.0714	115	1.4872	-
0.0745	120	1.3775	-
0.0776	125	1.3807	-
0.0807	130	1.3692	-
0.0838	135	1.3156	-
0.0869	140	1.328	-
0.0900	145	1.5123	-
0.0931	150	1.4037	1.3554
0.0962	155	1.4797	-
0.0993	160	1.4434	-
0.1024	165	1.3876	-
0.1055	170	1.3611	-
0.1086	175	1.3986	-
0.1117	180	1.3135	-
0.1148	185	1.3268	-
0.1179	190	1.2853	-
0.1210	195	1.3606	-
0.1241	200	1.4254	1.3225
0.1272	205	1.3152	-
0.1303	210	1.3482	-
0.1334	215	1.347	-
0.1365	220	1.3722	-
0.1396	225	1.3877	-
0.1428	230	1.3635	-
0.1459	235	1.4738	-
0.1490	240	1.4063	-
0.1521	245	1.3481	-
0.1552	250	1.3221	1.2848
0.1583	255	1.1117	-
0.1614	260	1.33	-
0.1645	265	1.3461	-
0.1676	270	1.2067	-
0.1707	275	1.3238	-
0.1738	280	1.4214	-
0.1769	285	1.3172	-
0.1800	290	1.2829	-
0.1831	295	1.3561	-
0.1862	300	1.2153	1.2869
0.1893	305	1.3482	-
0.1924	310	1.4491	-
0.1955	315	1.296	-
0.1986	320	1.5481	-
0.2017	325	1.3483	-
0.2048	330	1.2984	-
0.2079	335	1.2619	-
0.2110	340	1.2424	-
0.2141	345	1.3138	-
0.2172	350	1.4771	1.2831
0.2203	355	1.4589	-
0.2234	360	1.2647	-
0.2265	365	1.3268	-
0.2296	370	1.2185	-
0.2327	375	1.2264	-
0.2359	380	1.4256	-
0.2390	385	1.5409	-
0.2421	390	1.3106	-
0.2452	395	1.3129	-
0.2483	400	1.4063	1.2688
0.2514	405	1.1013	-
0.2545	410	1.3415	-
0.2576	415	1.4586	-
0.2607	420	1.2412	-
0.2638	425	1.3019	-
0.2669	430	1.2388	-
0.2700	435	1.3902	-
0.2731	440	1.3822	-
0.2762	445	1.2138	-
0.2793	450	1.4039	1.2490
0.2824	455	1.1758	-
0.2855	460	1.306	-
0.2886	465	1.4698	-
0.2917	470	1.2116	-
0.2948	475	1.2531	-
0.2979	480	1.3357	-
0.3010	485	1.1919	-
0.3041	490	1.3818	-
0.3072	495	1.2979	-
0.3103	500	1.2832	1.2466
0.3134	505	1.1689	-
0.3165	510	1.2198	-
0.3196	515	1.2775	-
0.3227	520	1.1344	-
0.3258	525	1.4492	-
0.3289	530	1.2328	-
0.3321	535	1.3306	-
0.3352	540	1.1076	-
0.3383	545	1.285	-
0.3414	550	1.2523	1.2435
0.3445	555	1.1712	-
0.3476	560	1.4021	-
0.3507	565	1.3476	-
0.3538	570	1.1485	-
0.3569	575	1.2621	-
0.3600	580	1.2829	-
0.3631	585	1.274	-
0.3662	590	1.2649	-
0.3693	595	1.2262	-
0.3724	600	1.1743	1.2378
0.3755	605	1.1773	-
0.3786	610	1.1977	-
0.3817	615	1.3976	-
0.3848	620	1.1817	-
0.3879	625	1.1928	-
0.3910	630	1.2338	-
0.3941	635	1.1803	-
0.3972	640	1.3811	-
0.4003	645	1.3125	-
0.4034	650	1.1878	1.2311
0.4065	655	1.4805	-
0.4096	660	1.1262	-
0.4127	665	1.1919	-
0.4158	670	1.2076	-
0.4189	675	1.2401	-
0.4220	680	1.3019	-
0.4252	685	1.3285	-
0.4283	690	1.1257	-
0.4314	695	1.2628	-
0.4345	700	1.1846	1.2354
0.4376	705	1.0939	-
0.4407	710	1.2502	-
0.4438	715	1.3645	-
0.4469	720	1.2408	-
0.4500	725	1.3127	-
0.4531	730	1.2795	-
0.4562	735	1.3127	-
0.4593	740	1.2164	-
0.4624	745	1.2942	-
0.4655	750	1.1968	1.2342
0.4686	755	1.2426	-
0.4717	760	1.2269	-
0.4748	765	1.3602	-
0.4779	770	1.2335	-
0.4810	775	1.3015	-
0.4841	780	1.1144	-
0.4872	785	1.3083	-
0.4903	790	1.273	-
0.4934	795	1.1784	-
0.4965	800	1.204	1.2348

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.44.0
PyTorch: 2.2.0+cu121
Accelerate: 0.33.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}