Edit model card

BGE small Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("haophancs/bge-small-financial-matryoshka")
# Run inference
sentences = [
    'The issuance of preferred stock could have the effect of restricting dividends on the Company’s common stock, diluting the voting power of its common stock, impairing the liquidation rights of its common stock, or delaying or preventing a change in control.',
    "What is the impact of issuing preferred stock according to the Company's description?",
    'For how long did Jeffrey P. Bezos serve as President at Amazon?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6643
cosine_accuracy@3 0.8243
cosine_accuracy@5 0.8614
cosine_accuracy@10 0.9086
cosine_precision@1 0.6643
cosine_precision@3 0.2748
cosine_precision@5 0.1723
cosine_precision@10 0.0909
cosine_recall@1 0.6643
cosine_recall@3 0.8243
cosine_recall@5 0.8614
cosine_recall@10 0.9086
cosine_ndcg@10 0.7906
cosine_mrr@10 0.7524
cosine_map@100 0.7563

Information Retrieval

Metric Value
cosine_accuracy@1 0.6657
cosine_accuracy@3 0.8243
cosine_accuracy@5 0.8629
cosine_accuracy@10 0.9114
cosine_precision@1 0.6657
cosine_precision@3 0.2748
cosine_precision@5 0.1726
cosine_precision@10 0.0911
cosine_recall@1 0.6657
cosine_recall@3 0.8243
cosine_recall@5 0.8629
cosine_recall@10 0.9114
cosine_ndcg@10 0.792
cosine_mrr@10 0.7534
cosine_map@100 0.7569

Information Retrieval

Metric Value
cosine_accuracy@1 0.6529
cosine_accuracy@3 0.8071
cosine_accuracy@5 0.8486
cosine_accuracy@10 0.9
cosine_precision@1 0.6529
cosine_precision@3 0.269
cosine_precision@5 0.1697
cosine_precision@10 0.09
cosine_recall@1 0.6529
cosine_recall@3 0.8071
cosine_recall@5 0.8486
cosine_recall@10 0.9
cosine_ndcg@10 0.778
cosine_mrr@10 0.7389
cosine_map@100 0.7425

Information Retrieval

Metric Value
cosine_accuracy@1 0.6357
cosine_accuracy@3 0.7757
cosine_accuracy@5 0.8129
cosine_accuracy@10 0.8586
cosine_precision@1 0.6357
cosine_precision@3 0.2586
cosine_precision@5 0.1626
cosine_precision@10 0.0859
cosine_recall@1 0.6357
cosine_recall@3 0.7757
cosine_recall@5 0.8129
cosine_recall@10 0.8586
cosine_ndcg@10 0.7491
cosine_mrr@10 0.7138
cosine_map@100 0.719

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,300 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 9 tokens
    • mean: 45.74 tokens
    • max: 512 tokens
    • min: 8 tokens
    • mean: 20.77 tokens
    • max: 43 tokens
  • Samples:
    positive anchor
    The company believes that trademarks have significant value for marketing products, e-commerce, stores, and business, with the possibility of indefinite renewal as long as the trademarks are in use. What are the benefits of registering trademarks for the company's business?
    The consolidated financial statements and accompanying notes listed in Part IV, Item 15(a)(1) of this Annual Report on Form 10-K are included immediately following Part IV hereof and incorporated by reference herein. How are the consolidated financial statements and accompanying notes incorporated into the Annual Report on Form 10-K?
    During the year ended December 31, 2023, the Company repurchased and subsequently retired 2,029,894 shares of common stock from the open market at an average cost of $103.45 per share for a total of $210.0 million. How many shares of common stock did the Company repurchase and subsequently retire during the year ended December 31, 2023?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_384_cosine_map@100 dim_64_cosine_map@100
0.8122 10 1.7741 - - - -
0.9746 12 - 0.7042 0.7262 0.7327 0.6639
1.6244 20 0.7817 - - - -
1.9492 24 - 0.7322 0.7477 0.7498 0.7136
2.4365 30 0.5816 - - - -
2.9239 36 - 0.7387 0.7563 0.7549 0.7165
3.2487 40 0.5121 - - - -
3.8985 48 - 0.7425 0.7569 0.7563 0.719
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.2.0+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
49
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Evaluation results