Edit model card

BGE-M3 Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("haophancs/bge-m3-financial-matryoshka")
# Run inference
sentences = [
    'As of January 28, 2024 the net carrying value of our inventories was $1.3 billion, which included provisions for obsolete and damaged inventory of $139.7 million.',
    "What is the status of the company's inventory as of January 28, 2024, in terms of its valuation and provisions for obsolescence?",
    'What is the relationship between the ESG goals and the long-term growth strategy?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7171
cosine_accuracy@3 0.8314
cosine_accuracy@5 0.87
cosine_accuracy@10 0.9143
cosine_precision@1 0.7171
cosine_precision@3 0.2771
cosine_precision@5 0.174
cosine_precision@10 0.0914
cosine_recall@1 0.7171
cosine_recall@3 0.8314
cosine_recall@5 0.87
cosine_recall@10 0.9143
cosine_ndcg@10 0.8152
cosine_mrr@10 0.7836
cosine_map@100 0.7867

Information Retrieval

Metric Value
cosine_accuracy@1 0.7129
cosine_accuracy@3 0.8343
cosine_accuracy@5 0.8657
cosine_accuracy@10 0.91
cosine_precision@1 0.7129
cosine_precision@3 0.2781
cosine_precision@5 0.1731
cosine_precision@10 0.091
cosine_recall@1 0.7129
cosine_recall@3 0.8343
cosine_recall@5 0.8657
cosine_recall@10 0.91
cosine_ndcg@10 0.8122
cosine_mrr@10 0.7809
cosine_map@100 0.7843

Information Retrieval

Metric Value
cosine_accuracy@1 0.7114
cosine_accuracy@3 0.8357
cosine_accuracy@5 0.8643
cosine_accuracy@10 0.91
cosine_precision@1 0.7114
cosine_precision@3 0.2786
cosine_precision@5 0.1729
cosine_precision@10 0.091
cosine_recall@1 0.7114
cosine_recall@3 0.8357
cosine_recall@5 0.8643
cosine_recall@10 0.91
cosine_ndcg@10 0.811
cosine_mrr@10 0.7793
cosine_map@100 0.7827

Information Retrieval

Metric Value
cosine_accuracy@1 0.7143
cosine_accuracy@3 0.8329
cosine_accuracy@5 0.8629
cosine_accuracy@10 0.9129
cosine_precision@1 0.7143
cosine_precision@3 0.2776
cosine_precision@5 0.1726
cosine_precision@10 0.0913
cosine_recall@1 0.7143
cosine_recall@3 0.8329
cosine_recall@5 0.8629
cosine_recall@10 0.9129
cosine_ndcg@10 0.8126
cosine_mrr@10 0.7806
cosine_map@100 0.7838

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 11 tokens
    • mean: 51.97 tokens
    • max: 1146 tokens
    • min: 7 tokens
    • mean: 21.63 tokens
    • max: 47 tokens
  • Samples:
    positive anchor
    From fiscal year 2022 to 2023, the cost of revenue as a percentage of total net revenue decreased by 3 percent. What was the percentage change in cost of revenue as a percentage of total net revenue from fiscal year 2022 to 2023?
    •Operating income increased $321 million, or 2%, to $18.1 billion versus year ago due to the increase in net sales, partially offset by a modest decrease in operating margin. What factors contributed to the increase in operating income for Procter & Gamble in 2023?
    market specific brands including 'Aurrera,' 'Lider,' and 'PhonePe.' What specific brands does Walmart International market?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            384
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 2
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_1024_cosine_map@100 dim_384_cosine_map@100 dim_512_cosine_map@100 dim_768_cosine_map@100
0.0127 10 0.2059 - - - -
0.0254 20 0.2612 - - - -
0.0381 30 0.0873 - - - -
0.0508 40 0.1352 - - - -
0.0635 50 0.156 - - - -
0.0762 60 0.0407 - - - -
0.0889 70 0.09 - - - -
0.1016 80 0.027 - - - -
0.1143 90 0.0978 - - - -
0.1270 100 0.0105 - - - -
0.1397 110 0.0402 - - - -
0.1524 120 0.0745 - - - -
0.1651 130 0.0655 - - - -
0.1778 140 0.0075 - - - -
0.1905 150 0.0141 - - - -
0.2032 160 0.0615 - - - -
0.2159 170 0.0029 - - - -
0.2286 180 0.0269 - - - -
0.2413 190 0.0724 - - - -
0.2540 200 0.0218 - - - -
0.2667 210 0.0027 - - - -
0.2794 220 0.007 - - - -
0.2921 230 0.0814 - - - -
0.3048 240 0.0326 - - - -
0.3175 250 0.0061 - - - -
0.3302 260 0.0471 - - - -
0.3429 270 0.0115 - - - -
0.3556 280 0.0021 - - - -
0.3683 290 0.0975 - - - -
0.3810 300 0.0572 - - - -
0.3937 310 0.0125 - - - -
0.4063 320 0.04 - - - -
0.4190 330 0.0023 - - - -
0.4317 340 0.0121 - - - -
0.4444 350 0.0116 - - - -
0.4571 360 0.0059 - - - -
0.4698 370 0.0217 - - - -
0.4825 380 0.0294 - - - -
0.4952 390 0.1102 - - - -
0.5079 400 0.0103 - - - -
0.5206 410 0.0023 - - - -
0.5333 420 0.0157 - - - -
0.5460 430 0.0805 - - - -
0.5587 440 0.0168 - - - -
0.5714 450 0.1279 - - - -
0.5841 460 0.2012 - - - -
0.5968 470 0.0436 - - - -
0.6095 480 0.0204 - - - -
0.6222 490 0.0097 - - - -
0.6349 500 0.0013 - - - -
0.6476 510 0.0042 - - - -
0.6603 520 0.0034 - - - -
0.6730 530 0.0226 - - - -
0.6857 540 0.0267 - - - -
0.6984 550 0.0007 - - - -
0.7111 560 0.0766 - - - -
0.7238 570 0.2174 - - - -
0.7365 580 0.0089 - - - -
0.7492 590 0.0794 - - - -
0.7619 600 0.0031 - - - -
0.7746 610 0.0499 - - - -
0.7873 620 0.0105 - - - -
0.8 630 0.0097 - - - -
0.8127 640 0.0028 - - - -
0.8254 650 0.0029 - - - -
0.8381 660 0.1811 - - - -
0.8508 670 0.064 - - - -
0.8635 680 0.0139 - - - -
0.8762 690 0.055 - - - -
0.8889 700 0.0013 - - - -
0.9016 710 0.0402 - - - -
0.9143 720 0.0824 - - - -
0.9270 730 0.03 - - - -
0.9397 740 0.0337 - - - -
0.9524 750 0.1192 - - - -
0.9651 760 0.0039 - - - -
0.9778 770 0.004 - - - -
0.9905 780 0.1413 - - - -
0.9994 787 - 0.7851 0.7794 0.7822 0.7863
1.0032 790 0.019 - - - -
1.0159 800 0.0587 - - - -
1.0286 810 0.0186 - - - -
1.0413 820 0.0018 - - - -
1.0540 830 0.0631 - - - -
1.0667 840 0.0127 - - - -
1.0794 850 0.0037 - - - -
1.0921 860 0.0029 - - - -
1.1048 870 0.1437 - - - -
1.1175 880 0.0015 - - - -
1.1302 890 0.0024 - - - -
1.1429 900 0.0133 - - - -
1.1556 910 0.0245 - - - -
1.1683 920 0.0017 - - - -
1.1810 930 0.0007 - - - -
1.1937 940 0.002 - - - -
1.2063 950 0.0044 - - - -
1.2190 960 0.0009 - - - -
1.2317 970 0.01 - - - -
1.2444 980 0.0026 - - - -
1.2571 990 0.0017 - - - -
1.2698 1000 0.0014 - - - -
1.2825 1010 0.0009 - - - -
1.2952 1020 0.0829 - - - -
1.3079 1030 0.0011 - - - -
1.3206 1040 0.012 - - - -
1.3333 1050 0.0019 - - - -
1.3460 1060 0.0007 - - - -
1.3587 1070 0.0141 - - - -
1.3714 1080 0.0003 - - - -
1.3841 1090 0.001 - - - -
1.3968 1100 0.0005 - - - -
1.4095 1110 0.0031 - - - -
1.4222 1120 0.0004 - - - -
1.4349 1130 0.0054 - - - -
1.4476 1140 0.0003 - - - -
1.4603 1150 0.0007 - - - -
1.4730 1160 0.0009 - - - -
1.4857 1170 0.001 - - - -
1.4984 1180 0.0006 - - - -
1.5111 1190 0.0046 - - - -
1.5238 1200 0.0003 - - - -
1.5365 1210 0.0002 - - - -
1.5492 1220 0.004 - - - -
1.5619 1230 0.0017 - - - -
1.5746 1240 0.0003 - - - -
1.5873 1250 0.0027 - - - -
1.6 1260 0.1134 - - - -
1.6127 1270 0.0007 - - - -
1.6254 1280 0.0005 - - - -
1.6381 1290 0.0008 - - - -
1.6508 1300 0.0001 - - - -
1.6635 1310 0.0023 - - - -
1.6762 1320 0.0005 - - - -
1.6889 1330 0.0004 - - - -
1.7016 1340 0.0003 - - - -
1.7143 1350 0.0347 - - - -
1.7270 1360 0.0339 - - - -
1.7397 1370 0.0003 - - - -
1.7524 1380 0.0005 - - - -
1.7651 1390 0.0002 - - - -
1.7778 1400 0.0031 - - - -
1.7905 1410 0.0002 - - - -
1.8032 1420 0.0012 - - - -
1.8159 1430 0.0002 - - - -
1.8286 1440 0.0002 - - - -
1.8413 1450 0.0004 - - - -
1.8540 1460 0.011 - - - -
1.8667 1470 0.0824 - - - -
1.8794 1480 0.0003 - - - -
1.8921 1490 0.0004 - - - -
1.9048 1500 0.0006 - - - -
1.9175 1510 0.015 - - - -
1.9302 1520 0.0004 - - - -
1.9429 1530 0.0004 - - - -
1.9556 1540 0.0011 - - - -
1.9683 1550 0.0003 - - - -
1.9810 1560 0.0006 - - - -
1.9937 1570 0.0042 - - - -
2.0 1575 - 0.7862 0.7855 0.7852 0.7878
2.0063 1580 0.0005 - - - -
2.0190 1590 0.002 - - - -
2.0317 1600 0.0013 - - - -
2.0444 1610 0.0002 - - - -
2.0571 1620 0.0035 - - - -
2.0698 1630 0.0004 - - - -
2.0825 1640 0.0002 - - - -
2.0952 1650 0.0032 - - - -
2.1079 1660 0.0916 - - - -
2.1206 1670 0.0002 - - - -
2.1333 1680 0.0006 - - - -
2.1460 1690 0.0002 - - - -
2.1587 1700 0.0003 - - - -
2.1714 1710 0.0001 - - - -
2.1841 1720 0.0001 - - - -
2.1968 1730 0.0004 - - - -
2.2095 1740 0.0004 - - - -
2.2222 1750 0.0001 - - - -
2.2349 1760 0.0002 - - - -
2.2476 1770 0.0007 - - - -
2.2603 1780 0.0001 - - - -
2.2730 1790 0.0002 - - - -
2.2857 1800 0.0004 - - - -
2.2984 1810 0.0711 - - - -
2.3111 1820 0.0001 - - - -
2.3238 1830 0.0005 - - - -
2.3365 1840 0.0004 - - - -
2.3492 1850 0.0001 - - - -
2.3619 1860 0.0005 - - - -
2.3746 1870 0.0003 - - - -
2.3873 1880 0.0001 - - - -
2.4 1890 0.0002 - - - -
2.4127 1900 0.0001 - - - -
2.4254 1910 0.0002 - - - -
2.4381 1920 0.0002 - - - -
2.4508 1930 0.0002 - - - -
2.4635 1940 0.0004 - - - -
2.4762 1950 0.0001 - - - -
2.4889 1960 0.0002 - - - -
2.5016 1970 0.0002 - - - -
2.5143 1980 0.0001 - - - -
2.5270 1990 0.0001 - - - -
2.5397 2000 0.0002 - - - -
2.5524 2010 0.0023 - - - -
2.5651 2020 0.0002 - - - -
2.5778 2030 0.0001 - - - -
2.5905 2040 0.0003 - - - -
2.6032 2050 0.0003 - - - -
2.6159 2060 0.0002 - - - -
2.6286 2070 0.0001 - - - -
2.6413 2080 0.0 - - - -
2.6540 2090 0.0001 - - - -
2.6667 2100 0.0001 - - - -
2.6794 2110 0.0001 - - - -
2.6921 2120 0.0001 - - - -
2.7048 2130 0.0001 - - - -
2.7175 2140 0.0048 - - - -
2.7302 2150 0.0005 - - - -
2.7429 2160 0.0001 - - - -
2.7556 2170 0.0001 - - - -
2.7683 2180 0.0001 - - - -
2.7810 2190 0.0001 - - - -
2.7937 2200 0.0001 - - - -
2.8063 2210 0.0001 - - - -
2.8190 2220 0.0001 - - - -
2.8317 2230 0.0002 - - - -
2.8444 2240 0.0036 - - - -
2.8571 2250 0.0001 - - - -
2.8698 2260 0.0368 - - - -
2.8825 2270 0.0003 - - - -
2.8952 2280 0.0002 - - - -
2.9079 2290 0.0001 - - - -
2.9206 2300 0.0005 - - - -
2.9333 2310 0.0001 - - - -
2.9460 2320 0.0001 - - - -
2.9587 2330 0.0003 - - - -
2.9714 2340 0.0001 - - - -
2.9841 2350 0.0001 - - - -
2.9968 2360 0.0002 - - - -
2.9994 2362 - 0.7864 0.7805 0.7838 0.7852
3.0095 2370 0.0025 - - - -
3.0222 2380 0.0002 - - - -
3.0349 2390 0.0001 - - - -
3.0476 2400 0.0001 - - - -
3.0603 2410 0.0001 - - - -
3.0730 2420 0.0001 - - - -
3.0857 2430 0.0001 - - - -
3.0984 2440 0.0002 - - - -
3.1111 2450 0.0116 - - - -
3.1238 2460 0.0002 - - - -
3.1365 2470 0.0001 - - - -
3.1492 2480 0.0001 - - - -
3.1619 2490 0.0001 - - - -
3.1746 2500 0.0001 - - - -
3.1873 2510 0.0001 - - - -
3.2 2520 0.0001 - - - -
3.2127 2530 0.0001 - - - -
3.2254 2540 0.0001 - - - -
3.2381 2550 0.0002 - - - -
3.2508 2560 0.0001 - - - -
3.2635 2570 0.0001 - - - -
3.2762 2580 0.0001 - - - -
3.2889 2590 0.0001 - - - -
3.3016 2600 0.063 - - - -
3.3143 2610 0.0001 - - - -
3.3270 2620 0.0001 - - - -
3.3397 2630 0.0001 - - - -
3.3524 2640 0.0001 - - - -
3.3651 2650 0.0002 - - - -
3.3778 2660 0.0001 - - - -
3.3905 2670 0.0001 - - - -
3.4032 2680 0.0001 - - - -
3.4159 2690 0.0001 - - - -
3.4286 2700 0.0001 - - - -
3.4413 2710 0.0001 - - - -
3.4540 2720 0.0002 - - - -
3.4667 2730 0.0001 - - - -
3.4794 2740 0.0001 - - - -
3.4921 2750 0.0001 - - - -
3.5048 2760 0.0001 - - - -
3.5175 2770 0.0002 - - - -
3.5302 2780 0.0001 - - - -
3.5429 2790 0.0001 - - - -
3.5556 2800 0.0001 - - - -
3.5683 2810 0.0001 - - - -
3.5810 2820 0.0001 - - - -
3.5937 2830 0.0001 - - - -
3.6063 2840 0.0001 - - - -
3.6190 2850 0.0 - - - -
3.6317 2860 0.0001 - - - -
3.6444 2870 0.0001 - - - -
3.6571 2880 0.0001 - - - -
3.6698 2890 0.0001 - - - -
3.6825 2900 0.0001 - - - -
3.6952 2910 0.0001 - - - -
3.7079 2920 0.0001 - - - -
3.7206 2930 0.0003 - - - -
3.7333 2940 0.0001 - - - -
3.7460 2950 0.0001 - - - -
3.7587 2960 0.0001 - - - -
3.7714 2970 0.0002 - - - -
3.7841 2980 0.0001 - - - -
3.7968 2990 0.0001 - - - -
3.8095 3000 0.0001 - - - -
3.8222 3010 0.0001 - - - -
3.8349 3020 0.0002 - - - -
3.8476 3030 0.0001 - - - -
3.8603 3040 0.0001 - - - -
3.8730 3050 0.0214 - - - -
3.8857 3060 0.0001 - - - -
3.8984 3070 0.0001 - - - -
3.9111 3080 0.0001 - - - -
3.9238 3090 0.0001 - - - -
3.9365 3100 0.0001 - - - -
3.9492 3110 0.0001 - - - -
3.9619 3120 0.0001 - - - -
3.9746 3130 0.0001 - - - -
3.9873 3140 0.0001 - - - -
3.9975 3148 - 0.7867 0.7838 0.7827 0.7843
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
36
Safetensors
Model size
568M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Evaluation results