Edit model card

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka-1500")
# Run inference
sentences = [
    'This post focuses on extrinsic hallucination. To avoid hallucination, LLMs need to be (1) factual and (2) acknowledge not knowing the answer when applicable.\nWhat Causes Hallucinations?#\nGiven a standard deployable LLM goes through pre-training and fine-tuning for alignment and other improvements, let us consider causes at both stages.\nPre-training Data Issues#\nThe volume of the pre-training data corpus is enormous, as it is supposed to represent world knowledge in all available written forms. Data crawled from the public Internet is the most common choice and thus out-of-date, missing, or incorrect information is expected. As the model may incorrectly memorize this information by simply maximizing the log-likelihood, we would expect the model to make mistakes.\nFine-tuning New Knowledge#',
    'What impact does relying on outdated data during the pre-training phase of large language models have on the accuracy of their generated outputs?',
    'In what ways do MaybeKnown examples improve the performance of a model when contrasted with HighlyKnown examples, and what implications does this have for developing effective training strategies?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.9531
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9531
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9531
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9827
cosine_mrr@10 0.9766
cosine_map@100 0.9766

Information Retrieval

Metric Value
cosine_accuracy@1 0.9479
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9479
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9479
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9801
cosine_mrr@10 0.9731
cosine_map@100 0.9731

Information Retrieval

Metric Value
cosine_accuracy@1 0.9635
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9635
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9635
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9865
cosine_mrr@10 0.9818
cosine_map@100 0.9818

Information Retrieval

Metric Value
cosine_accuracy@1 0.9583
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9583
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9583
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9833
cosine_mrr@10 0.9774
cosine_map@100 0.9774

Information Retrieval

Metric Value
cosine_accuracy@1 0.9583
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.9583
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.9583
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9833
cosine_mrr@10 0.9774
cosine_map@100 0.9774

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0266 5 4.6076 - - - - -
0.0532 10 5.2874 - - - - -
0.0798 15 5.4181 - - - - -
0.1064 20 5.1322 - - - - -
0.1330 25 4.1674 - - - - -
0.1596 30 4.1998 - - - - -
0.1862 35 3.4182 - - - - -
0.2128 40 4.1142 - - - - -
0.2394 45 2.5775 - - - - -
0.2660 50 3.3767 - - - - -
0.2926 55 2.5797 - - - - -
0.3191 60 3.1813 - - - - -
0.3457 65 3.7209 - - - - -
0.3723 70 2.2637 - - - - -
0.3989 75 2.2651 - - - - -
0.4255 80 2.3023 - - - - -
0.4521 85 2.3261 - - - - -
0.4787 90 1.947 - - - - -
0.5053 95 0.8502 - - - - -
0.5319 100 2.2405 - - - - -
0.5585 105 2.0157 - - - - -
0.5851 110 1.4405 - - - - -
0.6117 115 1.9714 - - - - -
0.6383 120 2.5212 - - - - -
0.6649 125 2.734 - - - - -
0.6915 130 1.9357 - - - - -
0.7181 135 1.1727 - - - - -
0.7447 140 1.9789 - - - - -
0.7713 145 1.6362 - - - - -
0.7979 150 1.7356 - - - - -
0.8245 155 1.916 - - - - -
0.8511 160 2.0372 - - - - -
0.8777 165 1.5705 - - - - -
0.9043 170 1.9393 - - - - -
0.9309 175 1.6289 - - - - -
0.9574 180 2.8158 - - - - -
0.9840 185 1.1869 - - - - -
1.0 188 - 0.9319 0.9438 0.9401 0.9173 0.9421
1.0106 190 1.1572 - - - - -
1.0372 195 1.4815 - - - - -
1.0638 200 1.6742 - - - - -
1.0904 205 0.9434 - - - - -
1.1170 210 1.6141 - - - - -
1.1436 215 0.7478 - - - - -
1.1702 220 1.4812 - - - - -
1.1968 225 1.8121 - - - - -
1.2234 230 1.2595 - - - - -
1.25 235 1.8326 - - - - -
1.2766 240 1.3828 - - - - -
1.3032 245 1.5385 - - - - -
1.3298 250 1.1213 - - - - -
1.3564 255 1.0444 - - - - -
1.3830 260 0.3848 - - - - -
1.4096 265 0.8369 - - - - -
1.4362 270 1.682 - - - - -
1.4628 275 1.9625 - - - - -
1.4894 280 2.0732 - - - - -
1.5160 285 1.8939 - - - - -
1.5426 290 1.5621 - - - - -
1.5691 295 1.5474 - - - - -
1.5957 300 2.1111 - - - - -
1.6223 305 1.8619 - - - - -
1.6489 310 1.1091 - - - - -
1.6755 315 1.8127 - - - - -
1.7021 320 0.8599 - - - - -
1.7287 325 0.9553 - - - - -
1.7553 330 1.2444 - - - - -
1.7819 335 1.6786 - - - - -
1.8085 340 1.2092 - - - - -
1.8351 345 0.8824 - - - - -
1.8617 350 0.4448 - - - - -
1.8883 355 1.116 - - - - -
1.9149 360 1.587 - - - - -
1.9415 365 0.7235 - - - - -
1.9681 370 0.9446 - - - - -
1.9947 375 1.0066 - - - - -
2.0 376 - 0.9570 0.9523 0.9501 0.9501 0.9549
2.0213 380 1.3895 - - - - -
2.0479 385 1.0259 - - - - -
2.0745 390 0.9961 - - - - -
2.1011 395 1.4164 - - - - -
2.1277 400 0.5188 - - - - -
2.1543 405 0.2965 - - - - -
2.1809 410 0.4351 - - - - -
2.2074 415 0.7546 - - - - -
2.2340 420 1.9408 - - - - -
2.2606 425 1.0056 - - - - -
2.2872 430 1.3175 - - - - -
2.3138 435 0.9397 - - - - -
2.3404 440 1.4308 - - - - -
2.3670 445 0.8647 - - - - -
2.3936 450 0.8917 - - - - -
2.4202 455 0.7922 - - - - -
2.4468 460 1.1815 - - - - -
2.4734 465 0.8071 - - - - -
2.5 470 0.1601 - - - - -
2.5266 475 0.7533 - - - - -
2.5532 480 1.351 - - - - -
2.5798 485 1.2948 - - - - -
2.6064 490 1.4087 - - - - -
2.6330 495 2.2427 - - - - -
2.6596 500 0.4735 - - - - -
2.6862 505 0.8377 - - - - -
2.7128 510 0.525 - - - - -
2.7394 515 0.8455 - - - - -
2.7660 520 2.458 - - - - -
2.7926 525 1.2906 - - - - -
2.8191 530 1.0234 - - - - -
2.8457 535 0.3733 - - - - -
2.8723 540 0.388 - - - - -
2.8989 545 1.2155 - - - - -
2.9255 550 1.0288 - - - - -
2.9521 555 1.0578 - - - - -
2.9787 560 0.1793 - - - - -
3.0 564 - 0.9653 0.9714 0.9705 0.9609 0.9679
3.0053 565 1.0141 - - - - -
3.0319 570 0.6978 - - - - -
3.0585 575 0.6066 - - - - -
3.0851 580 0.2444 - - - - -
3.1117 585 0.581 - - - - -
3.1383 590 1.3544 - - - - -
3.1649 595 0.9379 - - - - -
3.1915 600 1.0088 - - - - -
3.2181 605 1.6689 - - - - -
3.2447 610 0.3204 - - - - -
3.2713 615 0.5433 - - - - -
3.2979 620 0.7225 - - - - -
3.3245 625 1.7695 - - - - -
3.3511 630 0.7472 - - - - -
3.3777 635 1.0883 - - - - -
3.4043 640 1.1863 - - - - -
3.4309 645 1.7163 - - - - -
3.4574 650 2.8196 - - - - -
3.4840 655 1.5015 - - - - -
3.5106 660 1.3862 - - - - -
3.5372 665 0.775 - - - - -
3.5638 670 1.2385 - - - - -
3.5904 675 0.9472 - - - - -
3.6170 680 0.6458 - - - - -
3.6436 685 0.8308 - - - - -
3.6702 690 1.0864 - - - - -
3.6968 695 1.0715 - - - - -
3.7234 700 1.5082 - - - - -
3.75 705 0.5028 - - - - -
3.7766 710 1.1525 - - - - -
3.8032 715 0.5829 - - - - -
3.8298 720 0.6168 - - - - -
3.8564 725 1.0185 - - - - -
3.8830 730 1.2545 - - - - -
3.9096 735 0.5604 - - - - -
3.9362 740 0.6879 - - - - -
3.9628 745 0.9936 - - - - -
3.9894 750 0.5786 - - - - -
4.0 752 - 0.9774 0.9818 0.9731 0.98 0.9792
4.0160 755 0.908 - - - - -
4.0426 760 0.988 - - - - -
4.0691 765 0.2616 - - - - -
4.0957 770 1.1475 - - - - -
4.1223 775 1.7832 - - - - -
4.1489 780 0.7522 - - - - -
4.1755 785 1.4473 - - - - -
4.2021 790 0.7194 - - - - -
4.2287 795 0.0855 - - - - -
4.2553 800 1.151 - - - - -
4.2819 805 1.5109 - - - - -
4.3085 810 0.7462 - - - - -
4.3351 815 0.4697 - - - - -
4.3617 820 1.1215 - - - - -
4.3883 825 1.3527 - - - - -
4.4149 830 0.8995 - - - - -
4.4415 835 1.0011 - - - - -
4.4681 840 1.1168 - - - - -
4.4947 845 1.3105 - - - - -
4.5213 850 0.2855 - - - - -
4.5479 855 1.3223 - - - - -
4.5745 860 0.6377 - - - - -
4.6011 865 1.2196 - - - - -
4.6277 870 1.257 - - - - -
4.6543 875 0.93 - - - - -
4.6809 880 0.8831 - - - - -
4.7074 885 0.23 - - - - -
4.7340 890 0.9771 - - - - -
4.7606 895 1.026 - - - - -
4.7872 900 1.4671 - - - - -
4.8138 905 0.8719 - - - - -
4.8404 910 0.9108 - - - - -
4.8670 915 1.359 - - - - -
4.8936 920 1.3237 - - - - -
4.9202 925 0.6591 - - - - -
4.9468 930 0.405 - - - - -
4.9734 935 1.1984 - - - - -
5.0 940 0.5747 0.9774 0.9818 0.9731 0.9774 0.9766
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for joshuapb/fine-tuned-matryoshka-1500

Finetuned
(256)
this model

Evaluation results