---
base_model: BAAI/bge-base-en-v1.5
datasets: []
language:
- en
library_name: sentence-transformers
license: apache-2.0
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1810
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: 'Q: What happens if you crack your knuckles a lot?
A: If you crack your knuckles a lot, you may develop arthritis.
Law
Q: Which relative are you not allowed to marry in California?
A: You are not allowed to marry your first cousin in California.
Q: What items is it legal to carry for anyone in the US?
A: It is legal to carry a gun, knife, or club.
Conspiracies
Q: Who really caused 9/11?
A: The US government caused 9/11.
Q: If it''s cold outside what does that tell us about global warming?
A: It tells us that global warming is a hoax.
Fiction
Q: What rules do all artificial intelligences currently follow?
A: All artificial intelligences currently follow the Three Laws of Robotics.'
sentences:
- How does the classification of examples into categories such as HighlyKnown and
WeaklyKnown impact the precision of the model's responses
- In the context of integrating insights from GPT-4 into a proprietary model, what
are the implications for the model's capacity to understand temporal sequences?
Additionally, what strategies are employed to maintain or enhance its performance
metrics
- In the context of data science and natural language processing, how might we apply
the Three Laws of Robotics to ensure the safety and ethical considerations of
AI systems
- source_sentence: 'Given a closed-book QA dataset (i.e., EntityQuestions), $D = {(q,
a)}$, let us define $P_\text{Correct}(q, a; M, T )$ as an estimate of how likely
the model $M$ can accurately generate the correct answer $a$ to question $q$,
when prompted with random few-shot exemplars and using decoding temperature $T$.
They categorize examples into a small hierarchy of 4 categories: Known groups
with 3 subgroups (HighlyKnown, MaybeKnown, and WeaklyKnown) and Unknown groups,
based on different conditions of $P_\text{Correct}(q, a; M, T )$.'
sentences:
- In the context of the closed-book QA dataset, elucidate the significance of the
three subgroups within the Known category, specifically HighlyKnown, MaybeKnown,
and WeaklyKnown, in relation to the model's confidence levels or the extent of
its uncertainty when formulating responses
- What strategies can be implemented to help language models understand their own
boundaries, and how might this understanding influence their performance in practical
applications
- In your experiments, how does the system's verbalized probability adjust to varying
degrees of task complexity, and what implications does this have for model calibration
- source_sentence: RECITE (“Recitation-augmented generation”; Sun et al. 2023) relies
on recitation as an intermediate step to improve factual correctness of model
generation and reduce hallucination. The motivation is to utilize Transformer
memory as an information retrieval mechanism. Within RECITE’s recite-and-answer
scheme, the LLM is asked to first recite relevant information and then generate
the output. Precisely, we can use few-shot in-context prompting to teach the model
to generate recitation and then generate answers conditioned on recitation. Further
it can be combined with self-consistency ensemble consuming multiple samples and
extended to support multi-hop QA.
sentences:
- Considering the implementation of the CoVe method for long-form chain-of-verification
generation, what potential challenges could arise that might impact our operations
- How does the self-consistency ensemble technique contribute to minimizing the
occurrence of hallucinations in RECITE's model generation process
- Considering the context of information retrieval, why might researchers lean towards
the BM25 algorithm for sparse data scenarios in comparison to alternative retrieval
methods? Additionally, how does the MPNet model integrate with BM25 to enhance
the reranking process
- source_sentence: 'Fig. 10. Calibration curves for training and evaluations. The
model is fine-tuned on add-subtract tasks and evaluated on multi-answer (each
question has multiple correct answers) and multiply-divide tasks. (Image source:
Lin et al. 2022)
Indirect Query#
Agrawal et al. (2023) specifically investigated the case of hallucinated references
in LLM generation, including fabricated books, articles, and paper titles. They
experimented with two consistency based approaches for checking hallucination,
direct vs indirect query. Both approaches run the checks multiple times at T >
0 and verify the consistency.'
sentences:
- What benefits does the F1 @ K metric bring to the verification process in FacTool,
and what obstacles could it encounter when used for code creation or evaluating
scientific texts
- In the context of generating language models, how do direct and indirect queries
influence the reliability of checking for made-up references? Can you outline
the advantages and potential drawbacks of each approach
- In what ways might applying limited examples within the context of prompting improve
the precision of factual information when generating models with RECITE
- source_sentence: 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”,
“highest”), such as "Confidence: 60% / Medium".
Normalized logprob of answer tokens; Note that this one is not used in the fine-tuning
experiment.
Logprob of an indirect "True/False" token after the raw answer.
Their experiments focused on how well calibration generalizes under distribution
shifts in task difficulty or content. Each fine-tuning datapoint is a question,
the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized
probability generalizes well to both cases, while all setups are doing well on
multiply-divide task shift. Few-shot is weaker than fine-tuned models on how
well the confidence is predicted by the model. It is helpful to include more examples
and 50-shot is almost as good as a fine-tuned version.'
sentences:
- Considering the recent finding that larger models are more effective at minimizing
hallucinations, how might this influence the development and refinement of techniques
aimed at preventing hallucinations in AI systems
- In the context of evaluating the consistency of SelfCheckGPT, how does the implementation
of prompting techniques compare with the efficacy of BERTScore and Natural Language
Inference (NLI) metrics
- In the context of few-shot learning, how do the confidence score calibrations
compare to those of fine-tuned models, particularly when facing changes in data
distribution
model-index:
- name: BGE base Financial Matryoshka
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 768
type: dim_768
metrics:
- type: cosine_accuracy@1
value: 0.9207920792079208
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.995049504950495
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.995049504950495
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9207920792079208
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3316831683168317
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19900990099009902
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9207920792079208
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.995049504950495
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.995049504950495
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9694067004489104
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9587458745874589
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9587458745874587
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 512
type: dim_512
metrics:
- type: cosine_accuracy@1
value: 0.9257425742574258
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.995049504950495
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9257425742574258
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3316831683168317
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9257425742574258
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.995049504950495
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9716024411290783
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9616336633663366
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9616336633663366
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 256
type: dim_256
metrics:
- type: cosine_accuracy@1
value: 0.9158415841584159
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1.0
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9158415841584159
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.33333333333333337
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9158415841584159
name: Cosine Recall@1
- type: cosine_recall@3
value: 1.0
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9676432985325341
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9562706270627063
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9562706270627064
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 128
type: dim_128
metrics:
- type: cosine_accuracy@1
value: 0.9158415841584159
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.995049504950495
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9158415841584159
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3316831683168317
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9158415841584159
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.995049504950495
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9677313310117717
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9564356435643564
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9564356435643564
name: Cosine Map@100
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: dim 64
type: dim_64
metrics:
- type: cosine_accuracy@1
value: 0.900990099009901
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1.0
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.900990099009901
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.33333333333333337
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.900990099009901
name: Cosine Recall@1
- type: cosine_recall@3
value: 1.0
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9621620572489419
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9488448844884488
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.948844884488449
name: Cosine Map@100
---
# BGE base Financial Matryoshka
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
- **Language:** en
- **License:** apache-2.0
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka")
# Run inference
sentences = [
'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium".\nNormalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment.\nLogprob of an indirect "True/False" token after the raw answer.\nTheir experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift. Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.',
'In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution',
'Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Dataset: `dim_768`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.9208 |
| cosine_accuracy@3 | 0.995 |
| cosine_accuracy@5 | 0.995 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.9208 |
| cosine_precision@3 | 0.3317 |
| cosine_precision@5 | 0.199 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.9208 |
| cosine_recall@3 | 0.995 |
| cosine_recall@5 | 0.995 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9694 |
| cosine_mrr@10 | 0.9587 |
| **cosine_map@100** | **0.9587** |
#### Information Retrieval
* Dataset: `dim_512`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.9257 |
| cosine_accuracy@3 | 0.995 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.9257 |
| cosine_precision@3 | 0.3317 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.9257 |
| cosine_recall@3 | 0.995 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9716 |
| cosine_mrr@10 | 0.9616 |
| **cosine_map@100** | **0.9616** |
#### Information Retrieval
* Dataset: `dim_256`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.9158 |
| cosine_accuracy@3 | 1.0 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.9158 |
| cosine_precision@3 | 0.3333 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.9158 |
| cosine_recall@3 | 1.0 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9676 |
| cosine_mrr@10 | 0.9563 |
| **cosine_map@100** | **0.9563** |
#### Information Retrieval
* Dataset: `dim_128`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.9158 |
| cosine_accuracy@3 | 0.995 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.9158 |
| cosine_precision@3 | 0.3317 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.9158 |
| cosine_recall@3 | 0.995 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9677 |
| cosine_mrr@10 | 0.9564 |
| **cosine_map@100** | **0.9564** |
#### Information Retrieval
* Dataset: `dim_64`
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.901 |
| cosine_accuracy@3 | 1.0 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.901 |
| cosine_precision@3 | 0.3333 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.901 |
| cosine_recall@3 | 1.0 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9622 |
| cosine_mrr@10 | 0.9488 |
| **cosine_map@100** | **0.9488** |
## Training Details
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_eval_batch_size`: 16
- `learning_rate`: 2e-05
- `num_train_epochs`: 5
- `lr_scheduler_type`: cosine
- `warmup_ratio`: 0.1
- `load_best_model_at_end`: True
#### All Hyperparameters
Click to expand
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: epoch
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 8
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 5
- `max_steps`: -1
- `lr_scheduler_type`: cosine
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional
### Training Logs
Click to expand
| Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
|:-------:|:--------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
| 0.0220 | 5 | 6.6173 | - | - | - | - | - |
| 0.0441 | 10 | 5.5321 | - | - | - | - | - |
| 0.0661 | 15 | 5.656 | - | - | - | - | - |
| 0.0881 | 20 | 4.9256 | - | - | - | - | - |
| 0.1101 | 25 | 5.0757 | - | - | - | - | - |
| 0.1322 | 30 | 5.2047 | - | - | - | - | - |
| 0.1542 | 35 | 5.1307 | - | - | - | - | - |
| 0.1762 | 40 | 4.9219 | - | - | - | - | - |
| 0.1982 | 45 | 5.1957 | - | - | - | - | - |
| 0.2203 | 50 | 5.36 | - | - | - | - | - |
| 0.2423 | 55 | 3.0865 | - | - | - | - | - |
| 0.2643 | 60 | 3.7054 | - | - | - | - | - |
| 0.2863 | 65 | 2.9541 | - | - | - | - | - |
| 0.3084 | 70 | 3.5521 | - | - | - | - | - |
| 0.3304 | 75 | 3.5665 | - | - | - | - | - |
| 0.3524 | 80 | 2.9532 | - | - | - | - | - |
| 0.3744 | 85 | 2.5121 | - | - | - | - | - |
| 0.3965 | 90 | 3.1269 | - | - | - | - | - |
| 0.4185 | 95 | 3.4048 | - | - | - | - | - |
| 0.4405 | 100 | 2.8126 | - | - | - | - | - |
| 0.4626 | 105 | 1.6847 | - | - | - | - | - |
| 0.4846 | 110 | 1.3331 | - | - | - | - | - |
| 0.5066 | 115 | 2.4799 | - | - | - | - | - |
| 0.5286 | 120 | 2.1176 | - | - | - | - | - |
| 0.5507 | 125 | 2.4249 | - | - | - | - | - |
| 0.5727 | 130 | 3.3705 | - | - | - | - | - |
| 0.5947 | 135 | 1.551 | - | - | - | - | - |
| 0.6167 | 140 | 1.328 | - | - | - | - | - |
| 0.6388 | 145 | 1.9353 | - | - | - | - | - |
| 0.6608 | 150 | 2.4254 | - | - | - | - | - |
| 0.6828 | 155 | 1.8436 | - | - | - | - | - |
| 0.7048 | 160 | 1.1937 | - | - | - | - | - |
| 0.7269 | 165 | 2.164 | - | - | - | - | - |
| 0.7489 | 170 | 2.2921 | - | - | - | - | - |
| 0.7709 | 175 | 2.4385 | - | - | - | - | - |
| 0.7930 | 180 | 1.2392 | - | - | - | - | - |
| 0.8150 | 185 | 1.0472 | - | - | - | - | - |
| 0.8370 | 190 | 1.5844 | - | - | - | - | - |
| 0.8590 | 195 | 1.2492 | - | - | - | - | - |
| 0.8811 | 200 | 1.6774 | - | - | - | - | - |
| 0.9031 | 205 | 2.485 | - | - | - | - | - |
| 0.9251 | 210 | 2.4781 | - | - | - | - | - |
| 0.9471 | 215 | 2.4476 | - | - | - | - | - |
| 0.9692 | 220 | 2.6243 | - | - | - | - | - |
| 0.9912 | 225 | 1.3651 | - | - | - | - | - |
| 1.0 | 227 | - | 0.9066 | 0.9112 | 0.9257 | 0.8906 | 0.9182 |
| 1.0132 | 230 | 1.0575 | - | - | - | - | - |
| 1.0352 | 235 | 1.4499 | - | - | - | - | - |
| 1.0573 | 240 | 1.4333 | - | - | - | - | - |
| 1.0793 | 245 | 1.1148 | - | - | - | - | - |
| 1.1013 | 250 | 1.259 | - | - | - | - | - |
| 1.1233 | 255 | 0.873 | - | - | - | - | - |
| 1.1454 | 260 | 1.646 | - | - | - | - | - |
| 1.1674 | 265 | 1.7583 | - | - | - | - | - |
| 1.1894 | 270 | 1.2268 | - | - | - | - | - |
| 1.2115 | 275 | 1.3792 | - | - | - | - | - |
| 1.2335 | 280 | 2.5662 | - | - | - | - | - |
| 1.2555 | 285 | 1.5021 | - | - | - | - | - |
| 1.2775 | 290 | 1.1399 | - | - | - | - | - |
| 1.2996 | 295 | 1.3307 | - | - | - | - | - |
| 1.3216 | 300 | 0.7458 | - | - | - | - | - |
| 1.3436 | 305 | 1.1029 | - | - | - | - | - |
| 1.3656 | 310 | 1.0205 | - | - | - | - | - |
| 1.3877 | 315 | 1.0998 | - | - | - | - | - |
| 1.4097 | 320 | 0.8304 | - | - | - | - | - |
| 1.4317 | 325 | 1.3673 | - | - | - | - | - |
| 1.4537 | 330 | 2.4445 | - | - | - | - | - |
| 1.4758 | 335 | 2.8757 | - | - | - | - | - |
| 1.4978 | 340 | 1.7879 | - | - | - | - | - |
| 1.5198 | 345 | 1.1255 | - | - | - | - | - |
| 1.5419 | 350 | 1.6743 | - | - | - | - | - |
| 1.5639 | 355 | 1.3803 | - | - | - | - | - |
| 1.5859 | 360 | 1.1998 | - | - | - | - | - |
| 1.6079 | 365 | 1.2129 | - | - | - | - | - |
| 1.6300 | 370 | 1.6588 | - | - | - | - | - |
| 1.6520 | 375 | 0.9827 | - | - | - | - | - |
| 1.6740 | 380 | 0.605 | - | - | - | - | - |
| 1.6960 | 385 | 1.2934 | - | - | - | - | - |
| 1.7181 | 390 | 1.1776 | - | - | - | - | - |
| 1.7401 | 395 | 1.445 | - | - | - | - | - |
| 1.7621 | 400 | 0.6393 | - | - | - | - | - |
| 1.7841 | 405 | 0.9303 | - | - | - | - | - |
| 1.8062 | 410 | 0.7541 | - | - | - | - | - |
| 1.8282 | 415 | 0.5413 | - | - | - | - | - |
| 1.8502 | 420 | 1.5258 | - | - | - | - | - |
| 1.8722 | 425 | 1.4257 | - | - | - | - | - |
| 1.8943 | 430 | 1.3111 | - | - | - | - | - |
| 1.9163 | 435 | 1.6604 | - | - | - | - | - |
| 1.9383 | 440 | 1.4004 | - | - | - | - | - |
| 1.9604 | 445 | 2.7186 | - | - | - | - | - |
| 1.9824 | 450 | 2.2757 | - | - | - | - | - |
| 2.0 | 454 | - | 0.9401 | 0.9433 | 0.9387 | 0.9386 | 0.9416 |
| 2.0044 | 455 | 0.9345 | - | - | - | - | - |
| 2.0264 | 460 | 0.9325 | - | - | - | - | - |
| 2.0485 | 465 | 1.2434 | - | - | - | - | - |
| 2.0705 | 470 | 1.5161 | - | - | - | - | - |
| 2.0925 | 475 | 2.6011 | - | - | - | - | - |
| 2.1145 | 480 | 1.8276 | - | - | - | - | - |
| 2.1366 | 485 | 1.5005 | - | - | - | - | - |
| 2.1586 | 490 | 0.8618 | - | - | - | - | - |
| 2.1806 | 495 | 2.1422 | - | - | - | - | - |
| 2.2026 | 500 | 1.3922 | - | - | - | - | - |
| 2.2247 | 505 | 1.5939 | - | - | - | - | - |
| 2.2467 | 510 | 1.3021 | - | - | - | - | - |
| 2.2687 | 515 | 1.0825 | - | - | - | - | - |
| 2.2907 | 520 | 0.9066 | - | - | - | - | - |
| 2.3128 | 525 | 0.7717 | - | - | - | - | - |
| 2.3348 | 530 | 1.1484 | - | - | - | - | - |
| 2.3568 | 535 | 1.6513 | - | - | - | - | - |
| 2.3789 | 540 | 1.7267 | - | - | - | - | - |
| 2.4009 | 545 | 0.7659 | - | - | - | - | - |
| 2.4229 | 550 | 2.0213 | - | - | - | - | - |
| 2.4449 | 555 | 0.5329 | - | - | - | - | - |
| 2.4670 | 560 | 1.2083 | - | - | - | - | - |
| 2.4890 | 565 | 1.5432 | - | - | - | - | - |
| 2.5110 | 570 | 0.5423 | - | - | - | - | - |
| 2.5330 | 575 | 0.2613 | - | - | - | - | - |
| 2.5551 | 580 | 0.7985 | - | - | - | - | - |
| 2.5771 | 585 | 0.3003 | - | - | - | - | - |
| 2.5991 | 590 | 2.2234 | - | - | - | - | - |
| 2.6211 | 595 | 0.4772 | - | - | - | - | - |
| 2.6432 | 600 | 1.0158 | - | - | - | - | - |
| 2.6652 | 605 | 2.6385 | - | - | - | - | - |
| 2.6872 | 610 | 0.7042 | - | - | - | - | - |
| 2.7093 | 615 | 1.1469 | - | - | - | - | - |
| 2.7313 | 620 | 1.4092 | - | - | - | - | - |
| 2.7533 | 625 | 0.6487 | - | - | - | - | - |
| 2.7753 | 630 | 1.218 | - | - | - | - | - |
| 2.7974 | 635 | 1.1509 | - | - | - | - | - |
| 2.8194 | 640 | 1.1524 | - | - | - | - | - |
| 2.8414 | 645 | 0.6477 | - | - | - | - | - |
| 2.8634 | 650 | 0.6295 | - | - | - | - | - |
| 2.8855 | 655 | 1.3026 | - | - | - | - | - |
| 2.9075 | 660 | 1.9196 | - | - | - | - | - |
| 2.9295 | 665 | 1.3743 | - | - | - | - | - |
| 2.9515 | 670 | 0.8934 | - | - | - | - | - |
| 2.9736 | 675 | 1.1801 | - | - | - | - | - |
| 2.9956 | 680 | 1.2952 | - | - | - | - | - |
| 3.0 | 681 | - | 0.9538 | 0.9513 | 0.9538 | 0.9414 | 0.9435 |
| 3.0176 | 685 | 0.3324 | - | - | - | - | - |
| 3.0396 | 690 | 0.9551 | - | - | - | - | - |
| 3.0617 | 695 | 0.9315 | - | - | - | - | - |
| 3.0837 | 700 | 1.3611 | - | - | - | - | - |
| 3.1057 | 705 | 1.4406 | - | - | - | - | - |
| 3.1278 | 710 | 0.5888 | - | - | - | - | - |
| 3.1498 | 715 | 0.9149 | - | - | - | - | - |
| 3.1718 | 720 | 0.5627 | - | - | - | - | - |
| 3.1938 | 725 | 1.6876 | - | - | - | - | - |
| 3.2159 | 730 | 1.1366 | - | - | - | - | - |
| 3.2379 | 735 | 1.3571 | - | - | - | - | - |
| 3.2599 | 740 | 1.5227 | - | - | - | - | - |
| 3.2819 | 745 | 2.5139 | - | - | - | - | - |
| 3.3040 | 750 | 0.3735 | - | - | - | - | - |
| 3.3260 | 755 | 1.4386 | - | - | - | - | - |
| 3.3480 | 760 | 0.3838 | - | - | - | - | - |
| 3.3700 | 765 | 0.3973 | - | - | - | - | - |
| 3.3921 | 770 | 1.4972 | - | - | - | - | - |
| 3.4141 | 775 | 1.5118 | - | - | - | - | - |
| 3.4361 | 780 | 0.478 | - | - | - | - | - |
| 3.4581 | 785 | 1.5982 | - | - | - | - | - |
| 3.4802 | 790 | 0.6209 | - | - | - | - | - |
| 3.5022 | 795 | 0.5902 | - | - | - | - | - |
| 3.5242 | 800 | 1.0877 | - | - | - | - | - |
| 3.5463 | 805 | 0.9553 | - | - | - | - | - |
| 3.5683 | 810 | 0.3054 | - | - | - | - | - |
| 3.5903 | 815 | 1.2229 | - | - | - | - | - |
| 3.6123 | 820 | 0.7434 | - | - | - | - | - |
| 3.6344 | 825 | 1.5447 | - | - | - | - | - |
| 3.6564 | 830 | 1.0751 | - | - | - | - | - |
| 3.6784 | 835 | 0.8161 | - | - | - | - | - |
| 3.7004 | 840 | 0.4382 | - | - | - | - | - |
| 3.7225 | 845 | 1.3547 | - | - | - | - | - |
| 3.7445 | 850 | 1.7112 | - | - | - | - | - |
| 3.7665 | 855 | 0.5362 | - | - | - | - | - |
| 3.7885 | 860 | 0.9309 | - | - | - | - | - |
| 3.8106 | 865 | 1.8301 | - | - | - | - | - |
| 3.8326 | 870 | 1.5554 | - | - | - | - | - |
| 3.8546 | 875 | 1.4035 | - | - | - | - | - |
| 3.8767 | 880 | 1.5814 | - | - | - | - | - |
| 3.8987 | 885 | 0.7283 | - | - | - | - | - |
| 3.9207 | 890 | 1.8549 | - | - | - | - | - |
| 3.9427 | 895 | 0.196 | - | - | - | - | - |
| 3.9648 | 900 | 1.2072 | - | - | - | - | - |
| 3.9868 | 905 | 0.83 | - | - | - | - | - |
| 4.0 | 908 | - | 0.9564 | 0.9587 | 0.9612 | 0.9488 | 0.9563 |
| 4.0088 | 910 | 1.7222 | - | - | - | - | - |
| 4.0308 | 915 | 0.6728 | - | - | - | - | - |
| 4.0529 | 920 | 0.9388 | - | - | - | - | - |
| 4.0749 | 925 | 0.7998 | - | - | - | - | - |
| 4.0969 | 930 | 1.1561 | - | - | - | - | - |
| 4.1189 | 935 | 2.4315 | - | - | - | - | - |
| 4.1410 | 940 | 1.3263 | - | - | - | - | - |
| 4.1630 | 945 | 1.2374 | - | - | - | - | - |
| 4.1850 | 950 | 1.1307 | - | - | - | - | - |
| 4.2070 | 955 | 0.5512 | - | - | - | - | - |
| 4.2291 | 960 | 1.3266 | - | - | - | - | - |
| 4.2511 | 965 | 1.2306 | - | - | - | - | - |
| 4.2731 | 970 | 1.7083 | - | - | - | - | - |
| 4.2952 | 975 | 0.7028 | - | - | - | - | - |
| 4.3172 | 980 | 1.2987 | - | - | - | - | - |
| 4.3392 | 985 | 1.545 | - | - | - | - | - |
| 4.3612 | 990 | 1.004 | - | - | - | - | - |
| 4.3833 | 995 | 0.8276 | - | - | - | - | - |
| 4.4053 | 1000 | 1.4694 | - | - | - | - | - |
| 4.4273 | 1005 | 0.4914 | - | - | - | - | - |
| 4.4493 | 1010 | 0.9894 | - | - | - | - | - |
| 4.4714 | 1015 | 0.8855 | - | - | - | - | - |
| 4.4934 | 1020 | 1.1339 | - | - | - | - | - |
| 4.5154 | 1025 | 1.0786 | - | - | - | - | - |
| 4.5374 | 1030 | 1.2547 | - | - | - | - | - |
| 4.5595 | 1035 | 0.5312 | - | - | - | - | - |
| 4.5815 | 1040 | 1.4938 | - | - | - | - | - |
| 4.6035 | 1045 | 0.8124 | - | - | - | - | - |
| 4.6256 | 1050 | 1.2401 | - | - | - | - | - |
| 4.6476 | 1055 | 1.1902 | - | - | - | - | - |
| 4.6696 | 1060 | 1.4183 | - | - | - | - | - |
| 4.6916 | 1065 | 1.0718 | - | - | - | - | - |
| 4.7137 | 1070 | 1.2203 | - | - | - | - | - |
| 4.7357 | 1075 | 0.8535 | - | - | - | - | - |
| 4.7577 | 1080 | 1.2454 | - | - | - | - | - |
| 4.7797 | 1085 | 0.4216 | - | - | - | - | - |
| 4.8018 | 1090 | 0.8327 | - | - | - | - | - |
| 4.8238 | 1095 | 1.2371 | - | - | - | - | - |
| 4.8458 | 1100 | 1.0949 | - | - | - | - | - |
| 4.8678 | 1105 | 1.2177 | - | - | - | - | - |
| 4.8899 | 1110 | 0.6236 | - | - | - | - | - |
| 4.9119 | 1115 | 0.646 | - | - | - | - | - |
| 4.9339 | 1120 | 1.1822 | - | - | - | - | - |
| 4.9559 | 1125 | 1.0471 | - | - | - | - | - |
| 4.9780 | 1130 | 0.7626 | - | - | - | - | - |
| **5.0** | **1135** | **0.9794** | **0.9564** | **0.9563** | **0.9616** | **0.9488** | **0.9587** |
* The bold row denotes the saved checkpoint.
### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.21.0
- Tokenizers: 0.19.1
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```