SentenceTransformer based on sentence-transformers/gtr-t5-base

This is a sentence-transformers model finetuned from sentence-transformers/gtr-t5-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/gtr-t5-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: T5EncoderModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The travel agent secured the best possible deals for the group tour.',
    'negotiate tourism rates',
    'work with cultural venue specialists',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 713,657 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 7 tokens
    • mean: 15.46 tokens
    • max: 56 tokens
    • min: 2 tokens
    • mean: 5.33 tokens
    • max: 15 tokens
    • min: 1.0
    • mean: 1.0
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    This role involves planning delivery schedules. handle carriers 1.0
    This module focuses on automating ASP.NET deployments using Octopus Deploy. Octopus Deploy 1.0
    The team will transport the necessary tools to the designated location. move rigging equipment 1.0
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0112 500 0.0
0.0224 1000 0.0
0.0336 1500 0.0
0.0448 2000 0.0
0.0560 2500 0.0
0.0673 3000 0.0
0.0785 3500 0.0
0.0897 4000 0.0
0.1009 4500 0.0
0.1121 5000 0.0
0.1233 5500 0.0
0.1345 6000 0.0
0.1457 6500 0.0
0.1569 7000 0.0
0.1681 7500 0.0
0.1794 8000 0.0
0.1906 8500 0.0
0.2018 9000 0.0
0.2130 9500 0.0
0.2242 10000 0.0
0.2354 10500 0.0
0.2466 11000 0.0
0.2578 11500 0.0
0.2690 12000 0.0
0.2802 12500 0.0
0.2915 13000 0.0
0.3027 13500 0.0
0.3139 14000 0.0
0.3251 14500 0.0
0.3363 15000 0.0
0.3475 15500 0.0
0.3587 16000 0.0
0.3699 16500 0.0
0.3811 17000 0.0
0.3923 17500 0.0
0.4036 18000 0.0
0.4148 18500 0.0
0.4260 19000 0.0
0.4372 19500 0.0
0.4484 20000 0.0
0.4596 20500 0.0
0.4708 21000 0.0
0.4820 21500 0.0
0.4932 22000 0.0
0.5044 22500 0.0
0.5156 23000 0.0
0.5269 23500 0.0
0.5381 24000 0.0
0.5493 24500 0.0
0.5605 25000 0.0
0.5717 25500 0.0
0.5829 26000 0.0
0.5941 26500 0.0
0.6053 27000 0.0
0.6165 27500 0.0
0.6277 28000 0.0
0.6390 28500 0.0
0.6502 29000 0.0
0.6614 29500 0.0
0.6726 30000 0.0
0.6838 30500 0.0
0.6950 31000 0.0
0.7062 31500 0.0
0.7174 32000 0.0
0.7286 32500 0.0
0.7398 33000 0.0
0.7511 33500 0.0
0.7623 34000 0.0
0.7735 34500 0.0
0.7847 35000 0.0
0.7959 35500 0.0
0.8071 36000 0.0
0.8183 36500 0.0
0.8295 37000 0.0
0.8407 37500 0.0
0.8519 38000 0.0
0.8632 38500 0.0
0.8744 39000 0.0
0.8856 39500 0.0
0.8968 40000 0.0
0.9080 40500 0.0
0.9192 41000 0.0
0.9304 41500 0.0
0.9416 42000 0.0
0.9528 42500 0.0
0.9640 43000 0.0
0.9752 43500 0.0
0.9865 44000 0.0
0.9977 44500 0.0
1.0089 45000 0.0
1.0201 45500 0.0
1.0313 46000 0.0
1.0425 46500 0.0
1.0537 47000 0.0
1.0649 47500 0.0
1.0761 48000 0.0
1.0873 48500 0.0
1.0986 49000 0.0
1.1098 49500 0.0
1.1210 50000 0.0
1.1322 50500 0.0
1.1434 51000 0.0
1.1546 51500 0.0
1.1658 52000 0.0
1.1770 52500 0.0
1.1882 53000 0.0
1.1994 53500 0.0
1.2107 54000 0.0
1.2219 54500 0.0
1.2331 55000 0.0
1.2443 55500 0.0
1.2555 56000 0.0
1.2667 56500 0.0
1.2779 57000 0.0
1.2891 57500 0.0
1.3003 58000 0.0
1.3115 58500 0.0
1.3228 59000 0.0
1.3340 59500 0.0
1.3452 60000 0.0
1.3564 60500 0.0
1.3676 61000 0.0
1.3788 61500 0.0
1.3900 62000 0.0
1.4012 62500 0.0
1.4124 63000 0.0
1.4236 63500 0.0
1.4348 64000 0.0
1.4461 64500 0.0
1.4573 65000 0.0
1.4685 65500 0.0
1.4797 66000 0.0
1.4909 66500 0.0
1.5021 67000 0.0
1.5133 67500 0.0
1.5245 68000 0.0
1.5357 68500 0.0
1.5469 69000 0.0
1.5582 69500 0.0
1.5694 70000 0.0
1.5806 70500 0.0
1.5918 71000 0.0
1.6030 71500 0.0
1.6142 72000 0.0
1.6254 72500 0.0
1.6366 73000 0.0
1.6478 73500 0.0
1.6590 74000 0.0
1.6703 74500 0.0
1.6815 75000 0.0
1.6927 75500 0.0
1.7039 76000 0.0
1.7151 76500 0.0
1.7263 77000 0.0
1.7375 77500 0.0
1.7487 78000 0.0
1.7599 78500 0.0
1.7711 79000 0.0
1.7824 79500 0.0
1.7936 80000 0.0
1.8048 80500 0.0
1.8160 81000 0.0
1.8272 81500 0.0
1.8384 82000 0.0
1.8496 82500 0.0
1.8608 83000 0.0
1.8720 83500 0.0
1.8832 84000 0.0
1.8944 84500 0.0
1.9057 85000 0.0
1.9169 85500 0.0
1.9281 86000 0.0
1.9393 86500 0.0
1.9505 87000 0.0
1.9617 87500 0.0
1.9729 88000 0.0
1.9841 88500 0.0
1.9953 89000 0.0
2.0065 89500 0.0
2.0178 90000 0.0
2.0290 90500 0.0
2.0402 91000 0.0
2.0514 91500 0.0
2.0626 92000 0.0
2.0738 92500 0.0
2.0850 93000 0.0
2.0962 93500 0.0
2.1074 94000 0.0
2.1186 94500 0.0
2.1299 95000 0.0
2.1411 95500 0.0
2.1523 96000 0.0
2.1635 96500 0.0
2.1747 97000 0.0
2.1859 97500 0.0
2.1971 98000 0.0
2.2083 98500 0.0
2.2195 99000 0.0
2.2307 99500 0.0
2.2420 100000 0.0
2.2532 100500 0.0
2.2644 101000 0.0
2.2756 101500 0.0
2.2868 102000 0.0
2.2980 102500 0.0
2.3092 103000 0.0
2.3204 103500 0.0
2.3316 104000 0.0
2.3428 104500 0.0
2.3540 105000 0.0
2.3653 105500 0.0
2.3765 106000 0.0
2.3877 106500 0.0
2.3989 107000 0.0
2.4101 107500 0.0
2.4213 108000 0.0
2.4325 108500 0.0
2.4437 109000 0.0
2.4549 109500 0.0
2.4661 110000 0.0
2.4774 110500 0.0
2.4886 111000 0.0
2.4998 111500 0.0
2.5110 112000 0.0
2.5222 112500 0.0
2.5334 113000 0.0
2.5446 113500 0.0
2.5558 114000 0.0
2.5670 114500 0.0
2.5782 115000 0.0
2.5895 115500 0.0
2.6007 116000 0.0
2.6119 116500 0.0
2.6231 117000 0.0
2.6343 117500 0.0
2.6455 118000 0.0
2.6567 118500 0.0
2.6679 119000 0.0
2.6791 119500 0.0
2.6903 120000 0.0
2.7016 120500 0.0
2.7128 121000 0.0
2.7240 121500 0.0
2.7352 122000 0.0
2.7464 122500 0.0
2.7576 123000 0.0
2.7688 123500 0.0
2.7800 124000 0.0
2.7912 124500 0.0
2.8024 125000 0.0
2.8136 125500 0.0
2.8249 126000 0.0
2.8361 126500 0.0
2.8473 127000 0.0
2.8585 127500 0.0
2.8697 128000 0.0
2.8809 128500 0.0
2.8921 129000 0.0
2.9033 129500 0.0
2.9145 130000 0.0
2.9257 130500 0.0
2.9370 131000 0.0
2.9482 131500 0.0
2.9594 132000 0.0
2.9706 132500 0.0
2.9818 133000 0.0
2.9930 133500 0.0

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.5.1
  • Datasets: 3.4.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for abd1987/esco-gtr-t5

Finetuned
(1)
this model