Edit model card

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Andresckamilo/bge-base-financial-matryoshka")
# Run inference
sentences = [
    'What is the global presence of Lubrizol?',
    'How does The Coca-Cola Company distribute its beverage products globally?',
    'What are the two operating segments of NVIDIA as mentioned in the text?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6957
cosine_accuracy@3 0.8343
cosine_accuracy@5 0.8629
cosine_accuracy@10 0.9086
cosine_precision@1 0.6957
cosine_precision@3 0.2781
cosine_precision@5 0.1726
cosine_precision@10 0.0909
cosine_recall@1 0.6957
cosine_recall@3 0.8343
cosine_recall@5 0.8629
cosine_recall@10 0.9086
cosine_ndcg@10 0.8045
cosine_mrr@10 0.771
cosine_map@100 0.7747

Information Retrieval

Metric Value
cosine_accuracy@1 0.7
cosine_accuracy@3 0.8271
cosine_accuracy@5 0.8643
cosine_accuracy@10 0.9157
cosine_precision@1 0.7
cosine_precision@3 0.2757
cosine_precision@5 0.1729
cosine_precision@10 0.0916
cosine_recall@1 0.7
cosine_recall@3 0.8271
cosine_recall@5 0.8643
cosine_recall@10 0.9157
cosine_ndcg@10 0.8073
cosine_mrr@10 0.7726
cosine_map@100 0.7757

Information Retrieval

Metric Value
cosine_accuracy@1 0.6929
cosine_accuracy@3 0.82
cosine_accuracy@5 0.8586
cosine_accuracy@10 0.9029
cosine_precision@1 0.6929
cosine_precision@3 0.2733
cosine_precision@5 0.1717
cosine_precision@10 0.0903
cosine_recall@1 0.6929
cosine_recall@3 0.82
cosine_recall@5 0.8586
cosine_recall@10 0.9029
cosine_ndcg@10 0.7979
cosine_mrr@10 0.7643
cosine_map@100 0.7685

Information Retrieval

Metric Value
cosine_accuracy@1 0.6857
cosine_accuracy@3 0.81
cosine_accuracy@5 0.8543
cosine_accuracy@10 0.89
cosine_precision@1 0.6857
cosine_precision@3 0.27
cosine_precision@5 0.1709
cosine_precision@10 0.089
cosine_recall@1 0.6857
cosine_recall@3 0.81
cosine_recall@5 0.8543
cosine_recall@10 0.89
cosine_ndcg@10 0.7878
cosine_mrr@10 0.7549
cosine_map@100 0.7596

Information Retrieval

Metric Value
cosine_accuracy@1 0.6529
cosine_accuracy@3 0.7571
cosine_accuracy@5 0.8186
cosine_accuracy@10 0.8686
cosine_precision@1 0.6529
cosine_precision@3 0.2524
cosine_precision@5 0.1637
cosine_precision@10 0.0869
cosine_recall@1 0.6529
cosine_recall@3 0.7571
cosine_recall@5 0.8186
cosine_recall@10 0.8686
cosine_ndcg@10 0.7557
cosine_mrr@10 0.7201
cosine_map@100 0.7249

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 45.39 tokens
    • max: 371 tokens
    • min: 7 tokens
    • mean: 20.23 tokens
    • max: 45 tokens
  • Samples:
    positive anchor
    Chubb mitigates exposure to climate change risk by ceding catastrophe risk in our insurance portfolio through both reinsurance and capital markets, and our investment portfolio through the diversification of risk, industry, location, type and duration of security. How does Chubb respond to the risks associated with climate change?
    Item 8 of Part IV in the Annual Report on Form 10-K details the consolidated financial statements and accompanying notes. What documents are detailed in Item 8 of Part IV of the Annual Report on Form 10-K?
    While the outcome of this matter cannot be determined at this time, it is not currently expected to have a material adverse impact on our business. Is the outcome of the investigation into Tesla's waste segregation practices currently determinable?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8122 10 1.521 - - - - -
0.9746 12 - 0.7434 0.7579 0.7641 0.6994 0.7678
1.6244 20 0.6597 - - - - -
1.9492 24 - 0.7583 0.7628 0.7726 0.7219 0.7735
2.4365 30 0.4472 - - - - -
2.9239 36 - 0.7578 0.7661 0.7747 0.7251 0.7753
3.2487 40 0.3865 - - - - -
3.8985 48 - 0.7596 0.7685 0.7757 0.7249 0.7747
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
37
Safetensors
Model size
109M params
Tensor type
F32
·

Finetuned from

Evaluation results