SentenceTransformer based on dunzhang/stella_en_400M_v5

This is a sentence-transformers model finetuned from dunzhang/stella_en_400M_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: dunzhang/stella_en_400M_v5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Title: \nText: what was the average for "other" loans held in 2012 and 2011?',
    'Title: \nText: LOANS HELD FOR SALE Table 15: Loans Held For Sale\n| In millions | December 312012 | December 312011 |\n| Commercial mortgages at fair value | $772 | $843 |\n| Commercial mortgages at lower of cost or market | 620 | 451 |\n| Total commercial mortgages | 1,392 | 1,294 |\n| Residential mortgages at fair value | 2,096 | 1,415 |\n| Residential mortgages at lower of cost or market | 124 | 107 |\n| Total residential mortgages | 2,220 | 1,522 |\n| Other | 81 | 120 |\n| Total | $3,693 | $2,936 |\nWe stopped originating commercial mortgage loans held for sale designated at fair value in 2008 and continue pursuing opportunities to reduce these positions at appropriate prices.\nAt December 31, 2012, the balance relating to these loans was $772 million, compared to $843 million at December 31, 2011.\nWe sold $32 million in unpaid principal balances of these commercial mortgage loans held for sale carried at fair value in 2012 and sold $25 million in 2011.',
    'Title: \nText: Investments and Derivative Instruments (continued) Security Unrealized Loss Aging The following tables present the Company’s unrealized loss aging for AFS securities by type and length of time the security was in a continuous unrealized loss position.\n|  | December 31, 2011 |\n|  | Less Than 12 Months | 12 Months or More | Total |\n|  | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized |\n|  | Cost | Value | Losses | Cost | Value | Losses | Cost | Value | Losses |\n| ABS | $629 | $594 | $-35 | $1,169 | $872 | $-297 | $1,798 | $1,466 | $-332 |\n| CDOs | 81 | 59 | -22 | 2,709 | 2,383 | -326 | 2,790 | 2,442 | -348 |\n| CMBS | 1,297 | 1,194 | -103 | 2,144 | 1,735 | -409 | 3,441 | 2,929 | -512 |\n| Corporate [1] | 4,388 | 4,219 | -169 | 3,268 | 2,627 | -570 | 7,656 | 6,846 | -739 |\n| Foreign govt./govt. agencies | 218 | 212 | -6 | 51 | 47 | -4 | 269 | 259 | -10 |\n| Municipal | 299 | 294 | -5 | 627 | 560 | -67 | 926 | 854 | -72 |\n| RMBS | 415 | 330 | -85 | 1,206 | 835 | -371 | 1,621 | 1,165 | -456 |\n| U.S. Treasuries | 343 | 341 | -2 | — | — | — | 343 | 341 | -2 |\n| Total fixed maturities | 7,670 | 7,243 | -427 | 11,174 | 9,059 | -2,044 | 18,844 | 16,302 | -2,471 |\n| Equity securities | 167 | 138 | -29 | 439 | 265 | -174 | 606 | 403 | -203 |\n| Total securities in an unrealized loss | $7,837 | $7,381 | $-456 | $11,613 | $9,324 | $-2,218 | $19,450 | $16,705 | $-2,674 |\nDecember 31, 2010\n|  | December 31, 2010 |\n|  | Less Than 12 Months | 12 Months or More | Total |\n|  | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized |\n|  | Cost | Value | Losses | Cost | Value | Losses | Cost | Value | Losses |\n| ABS | $302 | $290 | $-12 | $1,410 | $1,026 | $-384 | $1,712 | $1,316 | $-396 |\n| CDOs | 321 | 293 | -28 | 2,724 | 2,274 | -450 | 3,045 | 2,567 | -478 |\n| CMBS | 556 | 530 | -26 | 3,962 | 3,373 | -589 | 4,518 | 3,903 | -615 |\n| Corporate | 5,533 | 5,329 | -199 | 4,017 | 3,435 | -548 | 9,550 | 8,764 | -747 |\n| Foreign govt./govt. agencies | 356 | 349 | -7 | 78 | 68 | -10 | 434 | 417 | -17 |\n| Municipal | 7,485 | 7,173 | -312 | 1,046 | 863 | -183 | 8,531 | 8,036 | -495 |\n| RMBS | 1,744 | 1,702 | -42 | 1,567 | 1,147 | -420 | 3,311 | 2,849 | -462 |\n| U.S. Treasuries | 2,436 | 2,321 | -115 | 158 | 119 | -39 | 2,594 | 2,440 | -154 |\n| Total fixed maturities | 18,733 | 17,987 | -741 | 14,962 | 12,305 | -2,623 | 33,695 | 30,292 | -3,364 |\n| Equity securities | 53 | 52 | -1 | 637 | 506 | -131 | 690 | 558 | -132 |\n| Total securities in an unrealized loss | $18,786 | $18,039 | $-742 | $15,599 | $12,811 | $-2,754 | $34,385 | $30,850 | $-3,496 |\n[1] Unrealized losses exclude the change in fair value of bifurcated embedded derivative features of certain securities.\nSubsequent changes in fair value are recorded in net realized capital gains (losses).\nAs of December 31, 2011, AFS securities in an unrealized loss position, comprised of 2,549 securities, primarily related to corporate securities within the financial services sector, CMBS, and RMBS which have experienced significant price deterioration.\nAs of December 31, 2011, 75% of these securities were depressed less than 20% of cost or amortized cost.\nThe decline in unrealized losses during 2011 was primarily attributable to a decline in interest rates, partially offset by credit spread widening.\nMost of the securities depressed for twelve months or more relate to structured securities with exposure to commercial and residential real estate, as well as certain floating rate corporate securities or those securities with greater than 10 years to maturity, concentrated in the financial services sector.\nCurrent market spreads continue to be significantly wider for structured securities with exposure to commercial and residential real estate, as compared to spreads at the security’s respective purchase date, largely due to the economic and market uncertainties regarding future performance of commercial and residential real estate.\nIn addition, the majority of securities have a floating-rate coupon referenced to a market index where rates have declined substantially.\nThe Company neither has an intention to sell nor does it expect to be required to sell the securities outlined above.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: Evaluate
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.3617
cosine_accuracy@3	0.5194
cosine_accuracy@5	0.6092
cosine_accuracy@10	0.7015
cosine_precision@1	0.3617
cosine_precision@3	0.1788
cosine_precision@5	0.1267
cosine_precision@10	0.0752
cosine_recall@1	0.331
cosine_recall@3	0.4768
cosine_recall@5	0.5614
cosine_recall@10	0.6548
cosine_ndcg@10	0.496
cosine_mrr@10	0.4668
cosine_map@100	0.4482
dot_accuracy@1	0.3325
dot_accuracy@3	0.5243
dot_accuracy@5	0.5922
dot_accuracy@10	0.6748
dot_precision@1	0.3325
dot_precision@3	0.1796
dot_precision@5	0.1248
dot_precision@10	0.0726
dot_recall@1	0.3059
dot_recall@3	0.4762
dot_recall@5	0.5446
dot_recall@10	0.6273
dot_ndcg@10	0.4723
dot_mrr@10	0.4422
dot_map@100	0.4264

Training Details

Training Dataset

Unnamed Dataset

Size: 2,256 training samples
Columns: sentence_0 and sentence_1
Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1
type string string
details
min: 29 tokens
mean: 45.01 tokens
max: 121 tokens

min: 26 tokens
mean: 406.1 tokens
max: 512 tokens

	sentence_0	sentence_1
type	string	string
details	min: 29 tokens mean: 45.01 tokens max: 121 tokens	min: 26 tokens mean: 406.1 tokens max: 512 tokens

Samples:

sentence_0	sentence_1
`Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Title: Text: In the year with largest amount of Net credit losses, what's the amount of Revenues, net of interest expense and Total operating expenses? (in million)`	`Title: Text: Comparison of Five-Year Cumulative Total Return The following graph compares the cumulative total return on Citigroup’s common stock with the S&P 500 Index and the S&P Financial Index over the five-year period extending through December31, 2009. The graph assumes that $100 was invested on December31, 2004 in Citigroup’s common stock, the S&P 500 Index and the S&P Financial Index and that all dividends were reinvested.`
`Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Title: Text: what was the total of net earnings attributable to pmi in 2017?`	Title: Text: the fair value of the psu award at the date of grant is amortized to expense over the performance period , which is typically three years after the date of the award , or upon death , disability or reaching the age of 58 . as of december 31 , 2017 , pmi had $ 34 million of total unrecognized compensation cost related to non-vested psu awards . this cost is recognized over a weighted-average performance cycle period of two years , or upon death , disability or reaching the age of 58 . during the years ended december 31 , 2017 , and 2016 , there were no psu awards that vested . pmi did not grant any psu awards during note 10 . earnings per share : unvested share-based payment awards that contain non-forfeitable rights to dividends or dividend equivalents are participating securities and therefore are included in pmi 2019s earnings per share calculation pursuant to the two-class method . basic and diluted earnings per share ( 201ceps 201d ) were calculated using the following: . ( in millions )
`Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Title: Text: for the terrestar acquisition what will the final cash purchase price be in millions paid upon closing?`	`Title: Text: dish network corporation notes to consolidated financial statements - continued this transaction was accounted for as a business combination using purchase price accounting . the allocation of the purchase consideration is in the table below . purchase allocation ( in thousands ) .`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 2
fp16: True
batch_sampler: no_duplicates
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Evaluate_cosine_map@100
0	0	0.2566
1.0	141	0.3931
2.0	282	0.4482

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.1.1
Transformers: 4.45.2
PyTorch: 2.5.1+cu121
Accelerate: 1.1.1
Datasets: 3.1.0
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

thomaskim1130
/

stella_en_400M_v5-FinanceRAG