Edit model card

SentenceTransformer based on intfloat/e5-base-unsupervised

This is a sentence-transformers model finetuned from intfloat/e5-base-unsupervised. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/e5-base-unsupervised
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/E5-base-unsupervised-TSDAE-2")
# Run inference
sentences = [
    'ligand ion channels located?',
    'where are ligand gated ion channels located?',
    "Duvets tend to be warm but surprisingly lightweight. The duvet cover makes it easier to change bedding looks and styles. You won't need to wash your duvet very often, just wash the cover regularly. Additionally, duvets tend to be fluffier than comforters, and can simplify bed making if you choose the European style.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7652
spearman_cosine 0.7525
pearson_manhattan 0.7393
spearman_manhattan 0.7326
pearson_euclidean 0.7402
spearman_euclidean 0.7335
pearson_dot 0.5003
spearman_dot 0.4986
pearson_max 0.7652
spearman_max 0.7525

Training Details

Training Dataset

Unnamed Dataset

  • Size: 700,000 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 3 tokens
    • mean: 15.73 tokens
    • max: 55 tokens
    • min: 8 tokens
    • mean: 36.05 tokens
    • max: 131 tokens
  • Samples:
    sentence_0 sentence_1
    Quality such a has components with applicable high objective system measure component improvements Quality in such a system has three components: high accuracy, compliance with applicable standards, and high customer satisfaction. The objective of the system is to measure each component and achieve improvements.
    include does qbi include capital gains?
    They have a . parietal is in, as becomes and pigments after four to is believed and in circadian cycles They have a third eye. The parietal eye is only visible in hatchlings, as it becomes covered in scales and pigments after four to six months. Its function is a subject of ongoing research, but it is believed to be useful in absorbing ultraviolet rays and in setting circadian and seasonal cycles.
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss sts-test_spearman_cosine
0 0 - 0.7211
0.0114 500 9.4957 -
0.0229 1000 7.4063 -
0.0343 1500 7.0225 -
0.0457 2000 6.6991 -
0.0571 2500 6.4054 -
0.0686 3000 6.1933 -
0.08 3500 5.999 -
0.0914 4000 5.8471 -
0.1 4375 - 0.4610
0.1029 4500 5.6876 -
0.1143 5000 5.5934 -
0.1257 5500 5.4877 -
0.1371 6000 5.4034 -
0.1486 6500 5.3016 -
0.16 7000 5.2169 -
0.1714 7500 5.1351 -
0.1829 8000 5.0605 -
0.1943 8500 4.9851 -
0.2 8750 - 0.6490
0.2057 9000 4.9024 -
0.2171 9500 4.8722 -
0.2286 10000 4.7955 -
0.24 10500 4.7435 -
0.2514 11000 4.6742 -
0.2629 11500 4.6447 -
0.2743 12000 4.5964 -
0.2857 12500 4.5186 -
0.2971 13000 4.5024 -
0.3 13125 - 0.7121
0.3086 13500 4.4336 -
0.32 14000 4.3767 -
0.3314 14500 4.3454 -
0.3429 15000 4.3067 -
0.3543 15500 4.2627 -
0.3657 16000 4.2323 -
0.3771 16500 4.208 -
0.3886 17000 4.1622 -
0.4 17500 4.113 0.7375
0.4114 18000 4.1097 -
0.4229 18500 4.0666 -
0.4343 19000 4.0311 -
0.4457 19500 4.0241 -
0.4571 20000 3.9991 -
0.4686 20500 3.9873 -
0.48 21000 3.9439 -
0.4914 21500 3.9281 -
0.5 21875 - 0.7502
0.5029 22000 3.9047 -
0.5143 22500 3.89 -
0.5257 23000 3.8671 -
0.5371 23500 3.85 -
0.5486 24000 3.8336 -
0.56 24500 3.8081 -
0.5714 25000 3.8049 -
0.5829 25500 3.7587 -
0.5943 26000 3.769 -
0.6 26250 - 0.7530
0.6057 26500 3.7488 -
0.6171 27000 3.7218 -
0.6286 27500 3.7128 -
0.64 28000 3.7104 -
0.6514 28500 3.6706 -
0.6629 29000 3.6602 -
0.6743 29500 3.658 -
0.6857 30000 3.665 -
0.6971 30500 3.6439 -
0.7 30625 - 0.7561
0.7086 31000 3.6411 -
0.72 31500 3.6141 -
0.7314 32000 3.6172 -
0.7429 32500 3.5975 -
0.7543 33000 3.5827 -
0.7657 33500 3.5836 -
0.7771 34000 3.5484 -
0.7886 34500 3.5275 -
0.8 35000 3.5587 0.7553
0.8114 35500 3.5371 -
0.8229 36000 3.5334 -
0.8343 36500 3.5168 -
0.8457 37000 3.483 -
0.8571 37500 3.4755 -
0.8686 38000 3.4943 -
0.88 38500 3.4699 -
0.8914 39000 3.4732 -
0.9 39375 - 0.7560
0.9029 39500 3.4572 -
0.9143 40000 3.4518 -
0.9257 40500 3.4298 -
0.9371 41000 3.4215 -
0.9486 41500 3.4176 -
0.96 42000 3.4353 -
0.9714 42500 3.4137 -
0.9829 43000 3.4037 -
0.9943 43500 3.4157 -
1.0 43750 - 0.7554
1.0057 44000 3.393 -
1.0171 44500 3.4092 -
1.0286 45000 3.3861 -
1.04 45500 3.3976 -
1.0514 46000 3.3769 -
1.0629 46500 3.3444 -
1.0743 47000 3.3598 -
1.0857 47500 3.3556 -
1.0971 48000 3.3548 -
1.1 48125 - 0.7549
1.1086 48500 3.3278 -
1.12 49000 3.3309 -
1.1314 49500 3.3459 -
1.1429 50000 3.3353 -
1.1543 50500 3.3192 -
1.1657 51000 3.3022 -
1.1771 51500 3.3189 -
1.1886 52000 3.301 -
1.2 52500 3.2785 0.7542
1.2114 53000 3.2996 -
1.2229 53500 3.2863 -
1.2343 54000 3.2916 -
1.2457 54500 3.272 -
1.2571 55000 3.2896 -
1.2686 55500 3.2694 -
1.28 56000 3.2848 -
1.2914 56500 3.2528 -
1.3 56875 - 0.7554
1.3029 57000 3.2622 -
1.3143 57500 3.2515 -
1.3257 58000 3.2385 -
1.3371 58500 3.2341 -
1.3486 59000 3.2275 -
1.3600 59500 3.2538 -
1.3714 60000 3.2329 -
1.3829 60500 3.2322 -
1.3943 61000 3.2039 -
1.4 61250 - 0.7530
1.4057 61500 3.212 -
1.4171 62000 3.2127 -
1.4286 62500 3.1956 -
1.44 63000 3.202 -
1.4514 63500 3.2046 -
1.4629 64000 3.2105 -
1.4743 64500 3.1915 -
1.4857 65000 3.176 -
1.4971 65500 3.1852 -
1.5 65625 - 0.7541
1.5086 66000 3.1988 -
1.52 66500 3.1714 -
1.5314 67000 3.1816 -
1.5429 67500 3.1745 -
1.5543 68000 3.1674 -
1.5657 68500 3.1887 -
1.5771 69000 3.1567 -
1.5886 69500 3.1775 -
1.6 70000 3.1696 0.7535
1.6114 70500 3.154 -
1.6229 71000 3.1553 -
1.6343 71500 3.1675 -
1.6457 72000 3.1516 -
1.6571 72500 3.1569 -
1.6686 73000 3.1403 -
1.6800 73500 3.1667 -
1.6914 74000 3.1545 -
1.7 74375 - 0.7529
1.7029 74500 3.1736 -
1.7143 75000 3.1447 -
1.7257 75500 3.1567 -
1.7371 76000 3.1682 -
1.7486 76500 3.149 -
1.76 77000 3.1522 -
1.7714 77500 3.1412 -
1.7829 78000 3.1268 -
1.7943 78500 3.1476 -
1.8 78750 - 0.7524
1.8057 79000 3.1669 -
1.8171 79500 3.1432 -
1.8286 80000 3.1603 -
1.8400 80500 3.1347 -
1.8514 81000 3.1209 -
1.8629 81500 3.1302 -
1.8743 82000 3.1423 -
1.8857 82500 3.1481 -
1.8971 83000 3.1262 -
1.9 83125 - 0.7525
1.9086 83500 3.1484 -
1.92 84000 3.1331 -
1.9314 84500 3.122 -
1.9429 85000 3.1272 -
1.9543 85500 3.1435 -
1.9657 86000 3.1431 -
1.9771 86500 3.1457 -
1.9886 87000 3.1286 -
2.0 87500 3.1352 0.7525

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2
  • Accelerate: 0.31.0
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna", 
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bobox/E5-base-unsupervised-TSDAE-2

Finetuned
(2)
this model

Evaluation results