word-order-jina / README.md
bwang0911's picture
Add new SentenceTransformer model
1bf4d04 verified
metadata
language:
  - en
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:11002
  - loss:MultipleNegativesRankingLoss
base_model: jinaai/jina-embeddings-v2-base-en
widget:
  - source_sentence: Man jumps alone on a desert road with mountains in the background.
    sentences:
      - A man jumps on the desert road
      - A man plays a silver electric guitar.
      - A man doesnt jump on the desert road
  - source_sentence: Players from two teams tangle together in pursuit of a flying rugby ball.
    sentences:
      - Two teams playing.
      - Two teams not playing.
      - Men are dancing in the street.
  - source_sentence: The team won the game in the final minute.
    sentences:
      - In the final minute, the team won the game.
      - The team lost the game in the final minute.
      - >-
        For their anniversary, they took a hike through the mountains, enjoying
        the peace and quiet of nature.
  - source_sentence: He finished reading the book in one sitting.
    sentences:
      - He struggled to finish the book and took a week to read it.
      - In one sitting, he finished reading the book.
      - jazz players create spontaneous superior orchestra
  - source_sentence: Paint preserves wood
    sentences:
      - Coating protects timber
      - timber coating protects
      - Single cell life came before complex creatures
datasets:
  - bwang0911/word-orders-triplet
  - jinaai/negation-dataset
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on jinaai/jina-embeddings-v2-base-en

This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v2-base-en on the word_orders and negation_dataset datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: JinaBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bwang0911/word-order-jina")
# Run inference
sentences = [
    'Paint preserves wood',
    'Coating protects timber',
    'timber coating protects',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Datasets

word_orders

  • Dataset: word_orders at 99609ac
  • Size: 1,002 training samples
  • Columns: anchor, pos, and neg
  • Approximate statistics based on the first 1000 samples:
    anchor pos neg
    type string string string
    details
    • min: 5 tokens
    • mean: 12.34 tokens
    • max: 32 tokens
    • min: 5 tokens
    • mean: 12.1 tokens
    • max: 30 tokens
    • min: 5 tokens
    • mean: 11.51 tokens
    • max: 24 tokens
  • Samples:
    anchor pos neg
    The river flows from the mountains to the sea Water travels from mountain peaks to ocean The river flows from the sea to the mountains
    Train departs London for Paris Railway journey from London heading to Paris Train departs Paris for London
    Cargo ship sails from Shanghai to Singapore Maritime route Shanghai to Singapore Cargo ship sails from Singapore to Shanghai
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20,
        "similarity_fct": "cos_sim"
    }
    

negation_dataset

  • Dataset: negation_dataset at cd02256
  • Size: 10,000 training samples
  • Columns: anchor, entailment, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor entailment negative
    type string string string
    details
    • min: 6 tokens
    • mean: 16.48 tokens
    • max: 44 tokens
    • min: 4 tokens
    • mean: 9.63 tokens
    • max: 31 tokens
    • min: 5 tokens
    • mean: 10.46 tokens
    • max: 32 tokens
  • Samples:
    anchor entailment negative
    Two young girls are playing outside in a non-urban environment. Two girls are playing outside. Two girls are not playing outside.
    A man with a red shirt is watching another man who is standing on top of a attached cart filled to the top. A man is standing on top of a cart. A man is not standing on top of a cart.
    A man in a blue shirt driving a Segway type vehicle. A person is riding a motorized vehicle. A person is not riding a motorized vehicle.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.1149 10 2.0411
0.2299 20 1.5167
0.3448 30 0.64
0.4598 40 0.6058
0.5747 50 0.6042
0.6897 60 0.4193
0.8046 70 0.5208
0.9195 80 0.4864
1.0345 90 0.4145
1.1494 100 0.69
1.2644 110 0.9602
1.3793 120 0.2539
1.4943 130 0.2558
1.6092 140 0.2769
1.7241 150 0.2154
1.8391 160 0.293
1.9540 170 0.2598
2.0690 180 0.2113
2.1839 190 0.9366
2.2989 200 0.2121
2.4138 210 0.1486
2.5287 220 0.1765
2.6437 230 0.1438
2.7586 240 0.1589
2.8736 250 0.1869
2.9885 260 0.1682

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}