Edit model card

UAE-Large-V1-financial-embeddings-matryoshka

This is a sentence-transformers model finetuned from WhereIsAI/UAE-Large-V1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: WhereIsAI/UAE-Large-V1
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rbhatia46/UAE-Large-V1-financial-rag-matryoshka")
# Run inference
sentences = [
    'According to Johnson & Johnson’s 2024 guidance report, their pharmaceutical sector was projected to grow by 7% in 2023 after considering crucial factors like the overall market demand, introduction of new drugs and potential impact of patent expirations.',
    'What was the projected growth of Johnson & Johnson’s pharmaceutical sector in 2023?',
    'How is JPMorgan Chase & Co. improving its cybersecurity measures?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.8316
cosine_accuracy@3 0.9326
cosine_accuracy@5 0.9663
cosine_accuracy@10 0.9896
cosine_precision@1 0.8316
cosine_precision@3 0.3109
cosine_precision@5 0.1933
cosine_precision@10 0.099
cosine_recall@1 0.8316
cosine_recall@3 0.9326
cosine_recall@5 0.9663
cosine_recall@10 0.9896
cosine_ndcg@10 0.9114
cosine_mrr@10 0.8861
cosine_map@100 0.8866

Information Retrieval

Metric Value
cosine_accuracy@1 0.829
cosine_accuracy@3 0.9326
cosine_accuracy@5 0.9663
cosine_accuracy@10 0.9845
cosine_precision@1 0.829
cosine_precision@3 0.3109
cosine_precision@5 0.1933
cosine_precision@10 0.0984
cosine_recall@1 0.829
cosine_recall@3 0.9326
cosine_recall@5 0.9663
cosine_recall@10 0.9845
cosine_ndcg@10 0.9098
cosine_mrr@10 0.8854
cosine_map@100 0.8863

Information Retrieval

Metric Value
cosine_accuracy@1 0.8238
cosine_accuracy@3 0.9378
cosine_accuracy@5 0.9637
cosine_accuracy@10 0.9845
cosine_precision@1 0.8238
cosine_precision@3 0.3126
cosine_precision@5 0.1927
cosine_precision@10 0.0984
cosine_recall@1 0.8238
cosine_recall@3 0.9378
cosine_recall@5 0.9637
cosine_recall@10 0.9845
cosine_ndcg@10 0.9085
cosine_mrr@10 0.8836
cosine_map@100 0.8844

Information Retrieval

Metric Value
cosine_accuracy@1 0.8212
cosine_accuracy@3 0.9326
cosine_accuracy@5 0.9611
cosine_accuracy@10 0.9793
cosine_precision@1 0.8212
cosine_precision@3 0.3109
cosine_precision@5 0.1922
cosine_precision@10 0.0979
cosine_recall@1 0.8212
cosine_recall@3 0.9326
cosine_recall@5 0.9611
cosine_recall@10 0.9793
cosine_ndcg@10 0.9051
cosine_mrr@10 0.8807
cosine_map@100 0.8817

Information Retrieval

Metric Value
cosine_accuracy@1 0.8187
cosine_accuracy@3 0.9352
cosine_accuracy@5 0.9611
cosine_accuracy@10 0.9793
cosine_precision@1 0.8187
cosine_precision@3 0.3117
cosine_precision@5 0.1922
cosine_precision@10 0.0979
cosine_recall@1 0.8187
cosine_recall@3 0.9352
cosine_recall@5 0.9611
cosine_recall@10 0.9793
cosine_ndcg@10 0.9031
cosine_mrr@10 0.8782
cosine_map@100 0.8793

Information Retrieval

Metric Value
cosine_accuracy@1 0.7979
cosine_accuracy@3 0.9223
cosine_accuracy@5 0.9585
cosine_accuracy@10 0.9793
cosine_precision@1 0.7979
cosine_precision@3 0.3074
cosine_precision@5 0.1917
cosine_precision@10 0.0979
cosine_recall@1 0.7979
cosine_recall@3 0.9223
cosine_recall@5 0.9585
cosine_recall@10 0.9793
cosine_ndcg@10 0.8936
cosine_mrr@10 0.8655
cosine_map@100 0.8667

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,474 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 15 tokens
    • mean: 44.84 tokens
    • max: 112 tokens
    • min: 8 tokens
    • mean: 18.34 tokens
    • max: 32 tokens
  • Samples:
    positive anchor
    Exxon Mobil faces substantial risk factors including fluctuating market prices for oil and gas, regulatory environment changes and the potential for catastrophic accidents such as oil spills. What is the key risk factor faced by Exxon Mobil in the energy sector?
    Tesla’s remarkable revenue growth in 2023 is largely driven by its robust electric vehicle sales in China and the strong demand for its energy storage products. What is the main reason behind Tesla’s revenue growth in 2023?
    Amazon is expected to see a sales growth of 23% in the next financial year, driven by the increased demand for their ecommerce business and strong growth in AWS. This projection is subject to changes in the market condition and customer spending patterns. What is the projected sales growth for Amazon in the next financial year?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8807 6 - 0.8708 0.8499 0.8647 0.8705 0.8307 0.8700
1.4679 10 0.7358 - - - - - -
1.9083 13 - 0.8848 0.8724 0.8782 0.8861 0.8617 0.8855
2.9358 20 0.1483 0.8865 0.8793 0.8814 0.8857 0.8667 0.8863
3.5229 24 - 0.8866 0.8793 0.8817 0.8844 0.8667 0.8863
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.6
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
3
Safetensors
Model size
335M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for rbhatia46/UAE-Large-V1-financial-rag-matryoshka

Finetuned
this model

Evaluation results