Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 32768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 32768, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 4096, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer(
    "<model_name>",
    model_kwargs={"attn_implementation": "flash_attention_2", "torch_dtype": torch.bfloat16}
)

# Run inference
documents = [
    'Your Upcoming Stay at Park Tower Knightsbridge\n\n\r\n[cid:image001.png@01DA125C.D84FF1A0]\r\n\r\n\r\nDear Abdulla Alhassani,\r\n\r\n\r\n\r\nThank you for choosing The Park Tower Knightsbridge, A Luxury Collection Hotel for your upcoming visit! We are looking forward to welcoming you to the hotel and\r\n\r\nwould like to prepare for your arrival with any special requests you may have. A few details from you, can help us best prepare and make your stay as memorable as possible.\r\n\r\nPlease share with us your estimated arrival time and let us know if we may assist you in booking a private car or taxi service to get to the hotel.\r\n\r\nEmail us here to book your car now.\r\n\r\n\r\n\r\nYOUR RESERVATION\r\n\r\nARRIVAL: 11/11/2023\r\n\r\nDEPARTURE: 11/22/2023\r\n\r\nCONFIRMATION NUMBER: 95476045\r\n\r\n\r\n\r\n*IF TRAVELLING WITH CHILDREN PLEASE CONFIRM THEIR AGES\r\n\r\n\r\n\r\n\r\n\r\n[cid:image002.png@01DA125C.D84FF1A0][cid:image003.png@01DA125C.D84FF1A0][cid:image004.png@01DA125C.D84FF1A0]\r\n\r\n\r\n[cid:image005.png@01DA125C.D84FF1A0]\r\n[cid:image006.png@01DA125C.D84FF1A0][cid:image007.png@01DA125C.D84FF1A0]\r\n[cid:image008.png@01DA125C.D84FF1A0]\n',
]

categories = [
    "Email category: 'Hotel -- Additional request of arrival time'. Email category description: 'A request from the hotel asking for the client to provide the exact or approximate check-in/arrival time as this is requested by the hotel due to different reasons. For example, the hotel does not have 24 hour reception and for this reason is asking for the arrival time. Information about the check-in helps the hotel better prepare for the guest's arrival and plan the schedule of the hotel staff.'",
    "Email category: 'Hotel -- Content request '. Email category description: 'This is an email from a hotelier who has seen photos of the hotel where they work and requests that certain photos be removed, changed, or new ones added. It could also be a request to modify the description of a particular facility or service at the hotel, such as information about type of meals, the deposit amount, or parking facilities. These are important letters that help us keep the hotel photo gallery on the website up to date, ensuring that guests can be confident in what they are booking when looking at the photos. We send such letters to the content department.'",
]

document_embeddings = model.encode(sentences)
category_embeddings = model.encode(sentences)

print(document_embeddings.shape)
# [1, 4096]

print(category_embeddings.shape)
# [2, 4096]

# Get the similarity scores for the embeddings
similarities = model.similarity(document_embeddings, category_embeddings)

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 6
  • per_device_eval_batch_size: 6
  • learning_rate: 2e-06
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 6
  • per_device_eval_batch_size: 6
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.0252 50 0.5621 -
0.0504 100 0.6789 -
0.0755 150 0.7126 -
0.1007 200 0.6461 0.1758
0.1259 250 0.3928 -
0.1511 300 0.3786 -
0.1762 350 0.4105 -
0.2014 400 0.3354 0.1420
0.2266 450 0.327 -
0.2518 500 0.2494 -
0.2769 550 0.1773 -
0.3021 600 0.1215 0.1241
0.3273 650 0.2426 -
0.3525 700 0.2279 -
0.3776 750 0.2151 -
0.4028 800 0.2676 0.1216
0.4280 850 0.2645 -
0.4532 900 0.2491 -
0.4783 950 0.2945 -
0.5035 1000 0.1859 0.1206
0.5287 1050 0.2401 -
0.5539 1100 0.2154 -
0.5791 1150 0.1731 -
0.6042 1200 0.1942 0.1196
0.6294 1250 0.2643 -
0.6546 1300 0.1806 -
0.6798 1350 0.1609 -
0.7049 1400 0.1008 0.1187

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, 
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
17
Safetensors
Model size
1.54B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.