regulatory-model / README.md
hshashank06's picture
Add new SentenceTransformer model
fbfd58c verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:185814
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: ' The passage suggests that the more utility an item has, the more value human beings assign to it, making utility synonymous with subjective human value.'
    sentences:
      - |2
         In the given passage about compound interest, how does the interest earned on a Series EE bond affect its value over time? 
      - |2
         What forms does Section 16 require insiders to file and when is Form 3 typically submitted? 
      - |2
         What is the relationship between an item's utility and its subjective human value according to the passage? 
  - source_sentence: ' The price per share is determined when a company goes public by giving a valuation to the company with the input of an investment bank. This value is then divided by the total number of shares to be issued.'
    sentences:
      - |2
         How is the price per share determined when a company goes public and involves an investment bank? 
      - |2
         What percentage decrease have Fisker shares experienced in the past year despite a 30% increase on Feb. 27?
      - |2
         What factors contributed to the strong performance and outstanding returns of residential construction stocks since the March lows as mentioned in the passage? 
  - source_sentence: ' Municipal bonds are discussed as this type of investment.'
    sentences:
      - |2
         What is the benefit and process of using an income-driven repayment (IDR) plan for federal student loans? 
      - |2
         Which luxury watch and jewelry brands are owned by Swatch?
      - |2
         What type of investment is discussed as a way to potentially increase after-tax returns by avoiding federal taxes, and is often chosen for its relative safety and steady return? 
  - source_sentence: ' The rally could potentially fill the Sept. 18 gap between $145 and $150, reaching the .618 sell-off retracement level. The on-balance volume indicator suggests that Roku stock is unlikely to test the September high at this time.'
    sentences:
      - |2
         What rewards and perks can Navy Federal Visa Signature Flagship Rewards credit card users receive upon opening an account and within the first 90 days? 
      - |2
         What aspects contribute to VeriFirst's high marks for comprehensiveness in their services? 
      - |2
         What levels could the rally potentially fill after buyers buy the dip into $120, and what does the on-balance volume indicator suggest about Roku stock's likelihood to test the September high at this time? 
  - source_sentence: ' Home Depot''s stock closed at $135.39 while being above a "golden cross" on January 19, 2017.'
    sentences:
      - |2
         In the given text passage, when did Home Depot's stock close at $135.39 while being above a "golden cross"? 
      - |2
         What term does JPMorgan use to refer to net interest margin in its financial materials, and what was their net interest margin in FY 2019 before the pandemic started? 
      - |2
         According to Maley, where might the funds from potentially declining sectors like FANGs be directed towards? 
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: Regulatory Financial Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.6027025718021989
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7349251707269822
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7675691383736136
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8058313556448878
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6027025718021989
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.24497505690899404
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1535138276747227
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0805831355644888
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.6027025718021989
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7349251707269822
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.7675691383736136
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8058313556448878
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.7073258915973659
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6754839282543154
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6796950515367028
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.5988763500750715
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7302271516443066
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7651474790526469
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8033612631375018
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5988763500750715
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2434090505481022
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.15302949581052935
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.08033612631375019
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5988763500750715
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7302271516443066
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.7651474790526469
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.8033612631375018
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.703957174859045
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6718512470776807
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6760676798344978
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.5881241826899791
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7223325422579552
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.756768537802102
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7946917227684409
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5881241826899791
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.24077751408598502
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1513537075604204
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.07946917227684411
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5881241826899791
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7223325422579552
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.756768537802102
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7946917227684409
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.694546619247058
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6621607466706108
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6665568671650335
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.5718990652395021
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7057683925025428
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7405918535380442
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7818569283673172
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5718990652395021
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2352561308341809
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.14811837070760883
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.07818569283673171
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5718990652395021
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.7057683925025428
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.7405918535380442
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7818569283673172
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6793129712551184
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6462535008352889
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6508176832915368
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.5443405821669007
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.6759335496682327
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.7138567346345716
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.7583183997675207
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.5443405821669007
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.22531118322274424
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.14277134692691432
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.07583183997675207
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.5443405821669007
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.6759335496682327
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.7138567346345716
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.7583183997675207
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6522706632460163
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6182239473662035
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6229041572175256
            name: Cosine Map@100

Regulatory Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hshashank06/regulatory-model")
# Run inference
sentences = [
    ' Home Depot\'s stock closed at $135.39 while being above a "golden cross" on January 19, 2017.',
    ' In the given text passage, when did Home Depot\'s stock close at $135.39 while being above a "golden cross"? \n',
    ' According to Maley, where might the funds from potentially declining sectors like FANGs be directed towards? \n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.6027 0.5989 0.5881 0.5719 0.5443
cosine_accuracy@3 0.7349 0.7302 0.7223 0.7058 0.6759
cosine_accuracy@5 0.7676 0.7651 0.7568 0.7406 0.7139
cosine_accuracy@10 0.8058 0.8034 0.7947 0.7819 0.7583
cosine_precision@1 0.6027 0.5989 0.5881 0.5719 0.5443
cosine_precision@3 0.245 0.2434 0.2408 0.2353 0.2253
cosine_precision@5 0.1535 0.153 0.1514 0.1481 0.1428
cosine_precision@10 0.0806 0.0803 0.0795 0.0782 0.0758
cosine_recall@1 0.6027 0.5989 0.5881 0.5719 0.5443
cosine_recall@3 0.7349 0.7302 0.7223 0.7058 0.6759
cosine_recall@5 0.7676 0.7651 0.7568 0.7406 0.7139
cosine_recall@10 0.8058 0.8034 0.7947 0.7819 0.7583
cosine_ndcg@10 0.7073 0.704 0.6945 0.6793 0.6523
cosine_mrr@10 0.6755 0.6719 0.6622 0.6463 0.6182
cosine_map@100 0.6797 0.6761 0.6666 0.6508 0.6229

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 185,814 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 3 tokens
    • mean: 43.18 tokens
    • max: 200 tokens
    • min: 10 tokens
    • mean: 23.08 tokens
    • max: 63 tokens
  • Samples:
    positive anchor
    The BVPS (Book Value Per Share) is calculated by dividing a company's common equity value by its total number of shares outstanding. In the given example, if a company has a common equity value of $100 million and 10 million shares outstanding, its BVPS would be $10 ($100 million / 10 million). You can calculate a company's BVPS using Microsoft Excel by entering the values of common stock, retained earnings, and additional paid-in capital into cells A1 through A3. What is the BVPS and how is it calculated?
    They facilitate commodities trading using their resources, can take delivery of commodities if needed, provide advisory services for clients, and act as market makers by buying and selling futures contracts to add liquidity to the marketplace. The passage uses the example of a commercial baking firm to demonstrate how their impact can be seen in the market. What role do eligible commercial entities play in commodities trading and market liquidity?
    Naive diversification is a type of diversification strategy where an investor randomly selects different securities, hoping to lower the risk of the portfolio due to the varied nature of the chosen securities. It is less sophisticated than diversification methods using statistical modeling, but when guided by experience, careful security examination, and common sense, it remains an effective strategy for reducing portfolio risk. What is the concept of naive diversification in investing and how does it compare to more sophisticated diversification methods?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0276 10 43.573 - - - - -
0.0551 20 42.1758 - - - - -
0.0827 30 37.6368 - - - - -
0.1102 40 34.5743 - - - - -
0.1378 50 29.5956 - - - - -
0.1653 60 23.4468 - - - - -
0.1929 70 19.7425 - - - - -
0.2204 80 16.9744 - - - - -
0.2480 90 15.2437 - - - - -
0.2755 100 13.9444 - - - - -
0.3031 110 12.067 - - - - -
0.3306 120 11.1149 - - - - -
0.3582 130 10.4083 - - - - -
0.3857 140 8.915 - - - - -
0.4133 150 9.4964 - - - - -
0.4408 160 8.0434 - - - - -
0.4684 170 8.1963 - - - - -
0.4960 180 8.5704 - - - - -
0.5235 190 7.711 - - - - -
0.5511 200 7.6676 - - - - -
0.5786 210 6.9899 - - - - -
0.6062 220 7.6195 - - - - -
0.6337 230 7.0456 - - - - -
0.6613 240 7.5541 - - - - -
0.6888 250 6.6543 - - - - -
0.7164 260 6.8849 - - - - -
0.7439 270 7.6635 - - - - -
0.7715 280 7.2155 - - - - -
0.7990 290 6.3284 - - - - -
0.8266 300 6.577 - - - - -
0.8541 310 5.0835 - - - - -
0.8817 320 6.1866 - - - - -
0.9092 330 5.9467 - - - - -
0.9368 340 5.663 - - - - -
0.9644 350 5.417 - - - - -
0.9919 360 6.0331 - - - - -
0.9974 362 - 0.6940 0.6900 0.6791 0.6603 0.6273
1.0220 370 5.5374 - - - - -
1.0496 380 4.5917 - - - - -
1.0771 390 4.6483 - - - - -
1.1047 400 4.96 - - - - -
1.1323 410 4.6808 - - - - -
1.1598 420 5.2396 - - - - -
1.1874 430 4.651 - - - - -
1.2149 440 4.4875 - - - - -
1.2425 450 4.6877 - - - - -
1.2700 460 4.2209 - - - - -
1.2976 470 4.678 - - - - -
1.3251 480 4.6774 - - - - -
1.3527 490 4.4409 - - - - -
1.3802 500 4.4464 - - - - -
1.4078 510 4.2724 - - - - -
1.4353 520 4.5017 - - - - -
1.4629 530 4.3469 - - - - -
1.4904 540 4.4925 - - - - -
1.5180 550 3.922 - - - - -
1.5455 560 4.6949 - - - - -
1.5731 570 4.0364 - - - - -
1.6007 580 4.3846 - - - - -
1.6282 590 3.7526 - - - - -
1.6558 600 4.0508 - - - - -
1.6833 610 4.6315 - - - - -
1.7109 620 3.7683 - - - - -
1.7384 630 4.6994 - - - - -
1.7660 640 4.1994 - - - - -
1.7935 650 4.3915 - - - - -
1.8211 660 4.2947 - - - - -
1.8486 670 4.6972 - - - - -
1.8762 680 4.1664 - - - - -
1.9037 690 4.1861 - - - - -
1.9313 700 3.6879 - - - - -
1.9588 710 4.3767 - - - - -
1.9864 720 4.48 - - - - -
1.9974 724 - 0.7013 0.6971 0.6885 0.6716 0.6414
2.0165 730 3.6164 - - - - -
2.0441 740 3.3361 - - - - -
2.0716 750 3.4175 - - - - -
2.0992 760 3.9006 - - - - -
2.1267 770 3.0823 - - - - -
2.1543 780 3.029 - - - - -
2.1818 790 3.8081 - - - - -
2.2094 800 3.4486 - - - - -
2.2370 810 3.6064 - - - - -
2.2645 820 3.0896 - - - - -
2.2921 830 3.3233 - - - - -
2.3196 840 2.9528 - - - - -
2.3472 850 3.0482 - - - - -
2.3747 860 3.2795 - - - - -
2.4023 870 2.9218 - - - - -
2.4298 880 3.4518 - - - - -
2.4574 890 3.6095 - - - - -
2.4849 900 3.2002 - - - - -
2.5125 910 3.368 - - - - -
2.5400 920 3.0623 - - - - -
2.5676 930 3.3495 - - - - -
2.5951 940 3.7123 - - - - -
2.6227 950 3.7795 - - - - -
2.6502 960 3.5567 - - - - -
2.6778 970 3.3498 - - - - -
2.7054 980 3.3141 - - - - -
2.7329 990 2.9425 - - - - -
2.7605 1000 2.9978 - - - - -
2.7880 1010 3.2468 - - - - -
2.8156 1020 2.5252 - - - - -
2.8431 1030 3.3108 - - - - -
2.8707 1040 3.195 - - - - -
2.8982 1050 3.1019 - - - - -
2.9258 1060 3.7059 - - - - -
2.9533 1070 3.1952 - - - - -
2.9809 1080 3.2454 - - - - -
2.9974 1086 - 0.7056 0.7030 0.6939 0.6779 0.6505
3.0110 1090 3.3788 - - - - -
3.0386 1100 2.9617 - - - - -
3.0661 1110 3.4313 - - - - -
3.0937 1120 2.5883 - - - - -
3.1212 1130 2.8836 - - - - -
3.1488 1140 2.3895 - - - - -
3.1763 1150 2.5155 - - - - -
3.2039 1160 3.3168 - - - - -
3.2314 1170 3.0286 - - - - -
3.2590 1180 3.1494 - - - - -
3.2866 1190 2.87 - - - - -
3.3141 1200 2.591 - - - - -
3.3417 1210 2.8437 - - - - -
3.3692 1220 3.0344 - - - - -
3.3968 1230 3.0685 - - - - -
3.4243 1240 3.4623 - - - - -
3.4519 1250 3.4256 - - - - -
3.4794 1260 2.7349 - - - - -
3.5070 1270 2.8587 - - - - -
3.5345 1280 2.729 - - - - -
3.5621 1290 3.0288 - - - - -
3.5896 1300 2.6599 - - - - -
3.6172 1310 2.4755 - - - - -
3.6447 1320 3.0501 - - - - -
3.6723 1330 2.545 - - - - -
3.6998 1340 2.5919 - - - - -
3.7274 1350 2.9026 - - - - -
3.7550 1360 2.7362 - - - - -
3.7825 1370 3.3311 - - - - -
3.8101 1380 2.8415 - - - - -
3.8376 1390 3.2033 - - - - -
3.8652 1400 2.7483 - - - - -
3.8927 1410 3.0403 - - - - -
3.9203 1420 3.0724 - - - - -
3.9478 1430 2.9797 - - - - -
3.9754 1440 2.6779 - - - - -
3.9974 1448 - 0.7073 0.704 0.6945 0.6793 0.6523
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.4.1
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}