Edit model card

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka")
# Run inference
sentences = [
    'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium".\nNormalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment.\nLogprob of an indirect "True/False" token after the raw answer.\nTheir experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift.  Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.',
    'In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution',
    'Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.9208
cosine_accuracy@3 0.995
cosine_accuracy@5 0.995
cosine_accuracy@10 1.0
cosine_precision@1 0.9208
cosine_precision@3 0.3317
cosine_precision@5 0.199
cosine_precision@10 0.1
cosine_recall@1 0.9208
cosine_recall@3 0.995
cosine_recall@5 0.995
cosine_recall@10 1.0
cosine_ndcg@10 0.9694
cosine_mrr@10 0.9587
cosine_map@100 0.9587

Information Retrieval

Metric Value
cosine_accuracy@1 0.9257
cosine_accuracy@3 0.995
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9257
cosine_precision@3 0.3317
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9257
cosine_recall@3 0.995
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9716
cosine_mrr@10 0.9616
cosine_map@100 0.9616

Information Retrieval

Metric Value
cosine_accuracy@1 0.9158
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9158
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9158
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9676
cosine_mrr@10 0.9563
cosine_map@100 0.9563

Information Retrieval

Metric Value
cosine_accuracy@1 0.9158
cosine_accuracy@3 0.995
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9158
cosine_precision@3 0.3317
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9158
cosine_recall@3 0.995
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9677
cosine_mrr@10 0.9564
cosine_map@100 0.9564

Information Retrieval

Metric Value
cosine_accuracy@1 0.901
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.901
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.901
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9622
cosine_mrr@10 0.9488
cosine_map@100 0.9488

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0220 5 6.6173 - - - - -
0.0441 10 5.5321 - - - - -
0.0661 15 5.656 - - - - -
0.0881 20 4.9256 - - - - -
0.1101 25 5.0757 - - - - -
0.1322 30 5.2047 - - - - -
0.1542 35 5.1307 - - - - -
0.1762 40 4.9219 - - - - -
0.1982 45 5.1957 - - - - -
0.2203 50 5.36 - - - - -
0.2423 55 3.0865 - - - - -
0.2643 60 3.7054 - - - - -
0.2863 65 2.9541 - - - - -
0.3084 70 3.5521 - - - - -
0.3304 75 3.5665 - - - - -
0.3524 80 2.9532 - - - - -
0.3744 85 2.5121 - - - - -
0.3965 90 3.1269 - - - - -
0.4185 95 3.4048 - - - - -
0.4405 100 2.8126 - - - - -
0.4626 105 1.6847 - - - - -
0.4846 110 1.3331 - - - - -
0.5066 115 2.4799 - - - - -
0.5286 120 2.1176 - - - - -
0.5507 125 2.4249 - - - - -
0.5727 130 3.3705 - - - - -
0.5947 135 1.551 - - - - -
0.6167 140 1.328 - - - - -
0.6388 145 1.9353 - - - - -
0.6608 150 2.4254 - - - - -
0.6828 155 1.8436 - - - - -
0.7048 160 1.1937 - - - - -
0.7269 165 2.164 - - - - -
0.7489 170 2.2921 - - - - -
0.7709 175 2.4385 - - - - -
0.7930 180 1.2392 - - - - -
0.8150 185 1.0472 - - - - -
0.8370 190 1.5844 - - - - -
0.8590 195 1.2492 - - - - -
0.8811 200 1.6774 - - - - -
0.9031 205 2.485 - - - - -
0.9251 210 2.4781 - - - - -
0.9471 215 2.4476 - - - - -
0.9692 220 2.6243 - - - - -
0.9912 225 1.3651 - - - - -
1.0 227 - 0.9066 0.9112 0.9257 0.8906 0.9182
1.0132 230 1.0575 - - - - -
1.0352 235 1.4499 - - - - -
1.0573 240 1.4333 - - - - -
1.0793 245 1.1148 - - - - -
1.1013 250 1.259 - - - - -
1.1233 255 0.873 - - - - -
1.1454 260 1.646 - - - - -
1.1674 265 1.7583 - - - - -
1.1894 270 1.2268 - - - - -
1.2115 275 1.3792 - - - - -
1.2335 280 2.5662 - - - - -
1.2555 285 1.5021 - - - - -
1.2775 290 1.1399 - - - - -
1.2996 295 1.3307 - - - - -
1.3216 300 0.7458 - - - - -
1.3436 305 1.1029 - - - - -
1.3656 310 1.0205 - - - - -
1.3877 315 1.0998 - - - - -
1.4097 320 0.8304 - - - - -
1.4317 325 1.3673 - - - - -
1.4537 330 2.4445 - - - - -
1.4758 335 2.8757 - - - - -
1.4978 340 1.7879 - - - - -
1.5198 345 1.1255 - - - - -
1.5419 350 1.6743 - - - - -
1.5639 355 1.3803 - - - - -
1.5859 360 1.1998 - - - - -
1.6079 365 1.2129 - - - - -
1.6300 370 1.6588 - - - - -
1.6520 375 0.9827 - - - - -
1.6740 380 0.605 - - - - -
1.6960 385 1.2934 - - - - -
1.7181 390 1.1776 - - - - -
1.7401 395 1.445 - - - - -
1.7621 400 0.6393 - - - - -
1.7841 405 0.9303 - - - - -
1.8062 410 0.7541 - - - - -
1.8282 415 0.5413 - - - - -
1.8502 420 1.5258 - - - - -
1.8722 425 1.4257 - - - - -
1.8943 430 1.3111 - - - - -
1.9163 435 1.6604 - - - - -
1.9383 440 1.4004 - - - - -
1.9604 445 2.7186 - - - - -
1.9824 450 2.2757 - - - - -
2.0 454 - 0.9401 0.9433 0.9387 0.9386 0.9416
2.0044 455 0.9345 - - - - -
2.0264 460 0.9325 - - - - -
2.0485 465 1.2434 - - - - -
2.0705 470 1.5161 - - - - -
2.0925 475 2.6011 - - - - -
2.1145 480 1.8276 - - - - -
2.1366 485 1.5005 - - - - -
2.1586 490 0.8618 - - - - -
2.1806 495 2.1422 - - - - -
2.2026 500 1.3922 - - - - -
2.2247 505 1.5939 - - - - -
2.2467 510 1.3021 - - - - -
2.2687 515 1.0825 - - - - -
2.2907 520 0.9066 - - - - -
2.3128 525 0.7717 - - - - -
2.3348 530 1.1484 - - - - -
2.3568 535 1.6513 - - - - -
2.3789 540 1.7267 - - - - -
2.4009 545 0.7659 - - - - -
2.4229 550 2.0213 - - - - -
2.4449 555 0.5329 - - - - -
2.4670 560 1.2083 - - - - -
2.4890 565 1.5432 - - - - -
2.5110 570 0.5423 - - - - -
2.5330 575 0.2613 - - - - -
2.5551 580 0.7985 - - - - -
2.5771 585 0.3003 - - - - -
2.5991 590 2.2234 - - - - -
2.6211 595 0.4772 - - - - -
2.6432 600 1.0158 - - - - -
2.6652 605 2.6385 - - - - -
2.6872 610 0.7042 - - - - -
2.7093 615 1.1469 - - - - -
2.7313 620 1.4092 - - - - -
2.7533 625 0.6487 - - - - -
2.7753 630 1.218 - - - - -
2.7974 635 1.1509 - - - - -
2.8194 640 1.1524 - - - - -
2.8414 645 0.6477 - - - - -
2.8634 650 0.6295 - - - - -
2.8855 655 1.3026 - - - - -
2.9075 660 1.9196 - - - - -
2.9295 665 1.3743 - - - - -
2.9515 670 0.8934 - - - - -
2.9736 675 1.1801 - - - - -
2.9956 680 1.2952 - - - - -
3.0 681 - 0.9538 0.9513 0.9538 0.9414 0.9435
3.0176 685 0.3324 - - - - -
3.0396 690 0.9551 - - - - -
3.0617 695 0.9315 - - - - -
3.0837 700 1.3611 - - - - -
3.1057 705 1.4406 - - - - -
3.1278 710 0.5888 - - - - -
3.1498 715 0.9149 - - - - -
3.1718 720 0.5627 - - - - -
3.1938 725 1.6876 - - - - -
3.2159 730 1.1366 - - - - -
3.2379 735 1.3571 - - - - -
3.2599 740 1.5227 - - - - -
3.2819 745 2.5139 - - - - -
3.3040 750 0.3735 - - - - -
3.3260 755 1.4386 - - - - -
3.3480 760 0.3838 - - - - -
3.3700 765 0.3973 - - - - -
3.3921 770 1.4972 - - - - -
3.4141 775 1.5118 - - - - -
3.4361 780 0.478 - - - - -
3.4581 785 1.5982 - - - - -
3.4802 790 0.6209 - - - - -
3.5022 795 0.5902 - - - - -
3.5242 800 1.0877 - - - - -
3.5463 805 0.9553 - - - - -
3.5683 810 0.3054 - - - - -
3.5903 815 1.2229 - - - - -
3.6123 820 0.7434 - - - - -
3.6344 825 1.5447 - - - - -
3.6564 830 1.0751 - - - - -
3.6784 835 0.8161 - - - - -
3.7004 840 0.4382 - - - - -
3.7225 845 1.3547 - - - - -
3.7445 850 1.7112 - - - - -
3.7665 855 0.5362 - - - - -
3.7885 860 0.9309 - - - - -
3.8106 865 1.8301 - - - - -
3.8326 870 1.5554 - - - - -
3.8546 875 1.4035 - - - - -
3.8767 880 1.5814 - - - - -
3.8987 885 0.7283 - - - - -
3.9207 890 1.8549 - - - - -
3.9427 895 0.196 - - - - -
3.9648 900 1.2072 - - - - -
3.9868 905 0.83 - - - - -
4.0 908 - 0.9564 0.9587 0.9612 0.9488 0.9563
4.0088 910 1.7222 - - - - -
4.0308 915 0.6728 - - - - -
4.0529 920 0.9388 - - - - -
4.0749 925 0.7998 - - - - -
4.0969 930 1.1561 - - - - -
4.1189 935 2.4315 - - - - -
4.1410 940 1.3263 - - - - -
4.1630 945 1.2374 - - - - -
4.1850 950 1.1307 - - - - -
4.2070 955 0.5512 - - - - -
4.2291 960 1.3266 - - - - -
4.2511 965 1.2306 - - - - -
4.2731 970 1.7083 - - - - -
4.2952 975 0.7028 - - - - -
4.3172 980 1.2987 - - - - -
4.3392 985 1.545 - - - - -
4.3612 990 1.004 - - - - -
4.3833 995 0.8276 - - - - -
4.4053 1000 1.4694 - - - - -
4.4273 1005 0.4914 - - - - -
4.4493 1010 0.9894 - - - - -
4.4714 1015 0.8855 - - - - -
4.4934 1020 1.1339 - - - - -
4.5154 1025 1.0786 - - - - -
4.5374 1030 1.2547 - - - - -
4.5595 1035 0.5312 - - - - -
4.5815 1040 1.4938 - - - - -
4.6035 1045 0.8124 - - - - -
4.6256 1050 1.2401 - - - - -
4.6476 1055 1.1902 - - - - -
4.6696 1060 1.4183 - - - - -
4.6916 1065 1.0718 - - - - -
4.7137 1070 1.2203 - - - - -
4.7357 1075 0.8535 - - - - -
4.7577 1080 1.2454 - - - - -
4.7797 1085 0.4216 - - - - -
4.8018 1090 0.8327 - - - - -
4.8238 1095 1.2371 - - - - -
4.8458 1100 1.0949 - - - - -
4.8678 1105 1.2177 - - - - -
4.8899 1110 0.6236 - - - - -
4.9119 1115 0.646 - - - - -
4.9339 1120 1.1822 - - - - -
4.9559 1125 1.0471 - - - - -
4.9780 1130 0.7626 - - - - -
5.0 1135 0.9794 0.9564 0.9563 0.9616 0.9488 0.9587
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for joshuapb/fine-tuned-matryoshka

Finetuned
(256)
this model

Evaluation results