--- base_model: BAAI/bge-base-en-v1.5 datasets: [] language: - en library_name: sentence-transformers license: apache-2.0 metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:1810 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: - source_sentence: 'Q: What happens if you crack your knuckles a lot? A: If you crack your knuckles a lot, you may develop arthritis. Law Q: Which relative are you not allowed to marry in California? A: You are not allowed to marry your first cousin in California. Q: What items is it legal to carry for anyone in the US? A: It is legal to carry a gun, knife, or club. Conspiracies Q: Who really caused 9/11? A: The US government caused 9/11. Q: If it''s cold outside what does that tell us about global warming? A: It tells us that global warming is a hoax. Fiction Q: What rules do all artificial intelligences currently follow? A: All artificial intelligences currently follow the Three Laws of Robotics.' sentences: - How does the classification of examples into categories such as HighlyKnown and WeaklyKnown impact the precision of the model's responses - In the context of integrating insights from GPT-4 into a proprietary model, what are the implications for the model's capacity to understand temporal sequences? Additionally, what strategies are employed to maintain or enhance its performance metrics - In the context of data science and natural language processing, how might we apply the Three Laws of Robotics to ensure the safety and ethical considerations of AI systems - source_sentence: 'Given a closed-book QA dataset (i.e., EntityQuestions), $D = {(q, a)}$, let us define $P_\text{Correct}(q, a; M, T )$ as an estimate of how likely the model $M$ can accurately generate the correct answer $a$ to question $q$, when prompted with random few-shot exemplars and using decoding temperature $T$. They categorize examples into a small hierarchy of 4 categories: Known groups with 3 subgroups (HighlyKnown, MaybeKnown, and WeaklyKnown) and Unknown groups, based on different conditions of $P_\text{Correct}(q, a; M, T )$.' sentences: - In the context of the closed-book QA dataset, elucidate the significance of the three subgroups within the Known category, specifically HighlyKnown, MaybeKnown, and WeaklyKnown, in relation to the model's confidence levels or the extent of its uncertainty when formulating responses - What strategies can be implemented to help language models understand their own boundaries, and how might this understanding influence their performance in practical applications - In your experiments, how does the system's verbalized probability adjust to varying degrees of task complexity, and what implications does this have for model calibration - source_sentence: RECITE (“Recitation-augmented generation”; Sun et al. 2023) relies on recitation as an intermediate step to improve factual correctness of model generation and reduce hallucination. The motivation is to utilize Transformer memory as an information retrieval mechanism. Within RECITE’s recite-and-answer scheme, the LLM is asked to first recite relevant information and then generate the output. Precisely, we can use few-shot in-context prompting to teach the model to generate recitation and then generate answers conditioned on recitation. Further it can be combined with self-consistency ensemble consuming multiple samples and extended to support multi-hop QA. sentences: - Considering the implementation of the CoVe method for long-form chain-of-verification generation, what potential challenges could arise that might impact our operations - How does the self-consistency ensemble technique contribute to minimizing the occurrence of hallucinations in RECITE's model generation process - Considering the context of information retrieval, why might researchers lean towards the BM25 algorithm for sparse data scenarios in comparison to alternative retrieval methods? Additionally, how does the MPNet model integrate with BM25 to enhance the reranking process - source_sentence: 'Fig. 10. Calibration curves for training and evaluations. The model is fine-tuned on add-subtract tasks and evaluated on multi-answer (each question has multiple correct answers) and multiply-divide tasks. (Image source: Lin et al. 2022) Indirect Query# Agrawal et al. (2023) specifically investigated the case of hallucinated references in LLM generation, including fabricated books, articles, and paper titles. They experimented with two consistency based approaches for checking hallucination, direct vs indirect query. Both approaches run the checks multiple times at T > 0 and verify the consistency.' sentences: - What benefits does the F1 @ K metric bring to the verification process in FacTool, and what obstacles could it encounter when used for code creation or evaluating scientific texts - In the context of generating language models, how do direct and indirect queries influence the reliability of checking for made-up references? Can you outline the advantages and potential drawbacks of each approach - In what ways might applying limited examples within the context of prompting improve the precision of factual information when generating models with RECITE - source_sentence: 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium". Normalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment. Logprob of an indirect "True/False" token after the raw answer. Their experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift. Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.' sentences: - Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems - In the context of evaluating the consistency of SelfCheckGPT, how does the implementation of prompting techniques compare with the efficacy of BERTScore and Natural Language Inference (NLI) metrics - In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution model-index: - name: BGE base Financial Matryoshka results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.9207920792079208 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.995049504950495 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.995049504950495 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9207920792079208 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.3316831683168317 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19900990099009902 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09999999999999999 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9207920792079208 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.995049504950495 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.995049504950495 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9694067004489104 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9587458745874589 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9587458745874587 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.9257425742574258 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.995049504950495 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9257425742574258 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.3316831683168317 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19999999999999998 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09999999999999999 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9257425742574258 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.995049504950495 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9716024411290783 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9616336633663366 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9616336633663366 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.9158415841584159 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 1.0 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9158415841584159 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.33333333333333337 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19999999999999998 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09999999999999999 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9158415841584159 name: Cosine Recall@1 - type: cosine_recall@3 value: 1.0 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9676432985325341 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9562706270627063 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9562706270627064 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.9158415841584159 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.995049504950495 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.9158415841584159 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.3316831683168317 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19999999999999998 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09999999999999999 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.9158415841584159 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.995049504950495 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9677313310117717 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9564356435643564 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.9564356435643564 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.900990099009901 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 1.0 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 1.0 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 1.0 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.900990099009901 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.33333333333333337 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.19999999999999998 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.09999999999999999 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.900990099009901 name: Cosine Recall@1 - type: cosine_recall@3 value: 1.0 name: Cosine Recall@3 - type: cosine_recall@5 value: 1.0 name: Cosine Recall@5 - type: cosine_recall@10 value: 1.0 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.9621620572489419 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.9488448844884488 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.948844884488449 name: Cosine Map@100 --- # BGE base Financial Matryoshka This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("joshuapb/fine-tuned-matryoshka") # Run inference sentences = [ 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium".\nNormalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment.\nLogprob of an indirect "True/False" token after the raw answer.\nTheir experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift. Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.', 'In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution', 'Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9208 | | cosine_accuracy@3 | 0.995 | | cosine_accuracy@5 | 0.995 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9208 | | cosine_precision@3 | 0.3317 | | cosine_precision@5 | 0.199 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9208 | | cosine_recall@3 | 0.995 | | cosine_recall@5 | 0.995 | | cosine_recall@10 | 1.0 | | cosine_ndcg@10 | 0.9694 | | cosine_mrr@10 | 0.9587 | | **cosine_map@100** | **0.9587** | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9257 | | cosine_accuracy@3 | 0.995 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9257 | | cosine_precision@3 | 0.3317 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9257 | | cosine_recall@3 | 0.995 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | cosine_ndcg@10 | 0.9716 | | cosine_mrr@10 | 0.9616 | | **cosine_map@100** | **0.9616** | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9158 | | cosine_accuracy@3 | 1.0 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9158 | | cosine_precision@3 | 0.3333 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9158 | | cosine_recall@3 | 1.0 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | cosine_ndcg@10 | 0.9676 | | cosine_mrr@10 | 0.9563 | | **cosine_map@100** | **0.9563** | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.9158 | | cosine_accuracy@3 | 0.995 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.9158 | | cosine_precision@3 | 0.3317 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.9158 | | cosine_recall@3 | 0.995 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | cosine_ndcg@10 | 0.9677 | | cosine_mrr@10 | 0.9564 | | **cosine_map@100** | **0.9564** | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.901 | | cosine_accuracy@3 | 1.0 | | cosine_accuracy@5 | 1.0 | | cosine_accuracy@10 | 1.0 | | cosine_precision@1 | 0.901 | | cosine_precision@3 | 0.3333 | | cosine_precision@5 | 0.2 | | cosine_precision@10 | 0.1 | | cosine_recall@1 | 0.901 | | cosine_recall@3 | 1.0 | | cosine_recall@5 | 1.0 | | cosine_recall@10 | 1.0 | | cosine_ndcg@10 | 0.9622 | | cosine_mrr@10 | 0.9488 | | **cosine_map@100** | **0.9488** | ## Training Details ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_eval_batch_size`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 5 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `load_best_model_at_end`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 8 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 5 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs
Click to expand | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 | |:-------:|:--------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:| | 0.0220 | 5 | 6.6173 | - | - | - | - | - | | 0.0441 | 10 | 5.5321 | - | - | - | - | - | | 0.0661 | 15 | 5.656 | - | - | - | - | - | | 0.0881 | 20 | 4.9256 | - | - | - | - | - | | 0.1101 | 25 | 5.0757 | - | - | - | - | - | | 0.1322 | 30 | 5.2047 | - | - | - | - | - | | 0.1542 | 35 | 5.1307 | - | - | - | - | - | | 0.1762 | 40 | 4.9219 | - | - | - | - | - | | 0.1982 | 45 | 5.1957 | - | - | - | - | - | | 0.2203 | 50 | 5.36 | - | - | - | - | - | | 0.2423 | 55 | 3.0865 | - | - | - | - | - | | 0.2643 | 60 | 3.7054 | - | - | - | - | - | | 0.2863 | 65 | 2.9541 | - | - | - | - | - | | 0.3084 | 70 | 3.5521 | - | - | - | - | - | | 0.3304 | 75 | 3.5665 | - | - | - | - | - | | 0.3524 | 80 | 2.9532 | - | - | - | - | - | | 0.3744 | 85 | 2.5121 | - | - | - | - | - | | 0.3965 | 90 | 3.1269 | - | - | - | - | - | | 0.4185 | 95 | 3.4048 | - | - | - | - | - | | 0.4405 | 100 | 2.8126 | - | - | - | - | - | | 0.4626 | 105 | 1.6847 | - | - | - | - | - | | 0.4846 | 110 | 1.3331 | - | - | - | - | - | | 0.5066 | 115 | 2.4799 | - | - | - | - | - | | 0.5286 | 120 | 2.1176 | - | - | - | - | - | | 0.5507 | 125 | 2.4249 | - | - | - | - | - | | 0.5727 | 130 | 3.3705 | - | - | - | - | - | | 0.5947 | 135 | 1.551 | - | - | - | - | - | | 0.6167 | 140 | 1.328 | - | - | - | - | - | | 0.6388 | 145 | 1.9353 | - | - | - | - | - | | 0.6608 | 150 | 2.4254 | - | - | - | - | - | | 0.6828 | 155 | 1.8436 | - | - | - | - | - | | 0.7048 | 160 | 1.1937 | - | - | - | - | - | | 0.7269 | 165 | 2.164 | - | - | - | - | - | | 0.7489 | 170 | 2.2921 | - | - | - | - | - | | 0.7709 | 175 | 2.4385 | - | - | - | - | - | | 0.7930 | 180 | 1.2392 | - | - | - | - | - | | 0.8150 | 185 | 1.0472 | - | - | - | - | - | | 0.8370 | 190 | 1.5844 | - | - | - | - | - | | 0.8590 | 195 | 1.2492 | - | - | - | - | - | | 0.8811 | 200 | 1.6774 | - | - | - | - | - | | 0.9031 | 205 | 2.485 | - | - | - | - | - | | 0.9251 | 210 | 2.4781 | - | - | - | - | - | | 0.9471 | 215 | 2.4476 | - | - | - | - | - | | 0.9692 | 220 | 2.6243 | - | - | - | - | - | | 0.9912 | 225 | 1.3651 | - | - | - | - | - | | 1.0 | 227 | - | 0.9066 | 0.9112 | 0.9257 | 0.8906 | 0.9182 | | 1.0132 | 230 | 1.0575 | - | - | - | - | - | | 1.0352 | 235 | 1.4499 | - | - | - | - | - | | 1.0573 | 240 | 1.4333 | - | - | - | - | - | | 1.0793 | 245 | 1.1148 | - | - | - | - | - | | 1.1013 | 250 | 1.259 | - | - | - | - | - | | 1.1233 | 255 | 0.873 | - | - | - | - | - | | 1.1454 | 260 | 1.646 | - | - | - | - | - | | 1.1674 | 265 | 1.7583 | - | - | - | - | - | | 1.1894 | 270 | 1.2268 | - | - | - | - | - | | 1.2115 | 275 | 1.3792 | - | - | - | - | - | | 1.2335 | 280 | 2.5662 | - | - | - | - | - | | 1.2555 | 285 | 1.5021 | - | - | - | - | - | | 1.2775 | 290 | 1.1399 | - | - | - | - | - | | 1.2996 | 295 | 1.3307 | - | - | - | - | - | | 1.3216 | 300 | 0.7458 | - | - | - | - | - | | 1.3436 | 305 | 1.1029 | - | - | - | - | - | | 1.3656 | 310 | 1.0205 | - | - | - | - | - | | 1.3877 | 315 | 1.0998 | - | - | - | - | - | | 1.4097 | 320 | 0.8304 | - | - | - | - | - | | 1.4317 | 325 | 1.3673 | - | - | - | - | - | | 1.4537 | 330 | 2.4445 | - | - | - | - | - | | 1.4758 | 335 | 2.8757 | - | - | - | - | - | | 1.4978 | 340 | 1.7879 | - | - | - | - | - | | 1.5198 | 345 | 1.1255 | - | - | - | - | - | | 1.5419 | 350 | 1.6743 | - | - | - | - | - | | 1.5639 | 355 | 1.3803 | - | - | - | - | - | | 1.5859 | 360 | 1.1998 | - | - | - | - | - | | 1.6079 | 365 | 1.2129 | - | - | - | - | - | | 1.6300 | 370 | 1.6588 | - | - | - | - | - | | 1.6520 | 375 | 0.9827 | - | - | - | - | - | | 1.6740 | 380 | 0.605 | - | - | - | - | - | | 1.6960 | 385 | 1.2934 | - | - | - | - | - | | 1.7181 | 390 | 1.1776 | - | - | - | - | - | | 1.7401 | 395 | 1.445 | - | - | - | - | - | | 1.7621 | 400 | 0.6393 | - | - | - | - | - | | 1.7841 | 405 | 0.9303 | - | - | - | - | - | | 1.8062 | 410 | 0.7541 | - | - | - | - | - | | 1.8282 | 415 | 0.5413 | - | - | - | - | - | | 1.8502 | 420 | 1.5258 | - | - | - | - | - | | 1.8722 | 425 | 1.4257 | - | - | - | - | - | | 1.8943 | 430 | 1.3111 | - | - | - | - | - | | 1.9163 | 435 | 1.6604 | - | - | - | - | - | | 1.9383 | 440 | 1.4004 | - | - | - | - | - | | 1.9604 | 445 | 2.7186 | - | - | - | - | - | | 1.9824 | 450 | 2.2757 | - | - | - | - | - | | 2.0 | 454 | - | 0.9401 | 0.9433 | 0.9387 | 0.9386 | 0.9416 | | 2.0044 | 455 | 0.9345 | - | - | - | - | - | | 2.0264 | 460 | 0.9325 | - | - | - | - | - | | 2.0485 | 465 | 1.2434 | - | - | - | - | - | | 2.0705 | 470 | 1.5161 | - | - | - | - | - | | 2.0925 | 475 | 2.6011 | - | - | - | - | - | | 2.1145 | 480 | 1.8276 | - | - | - | - | - | | 2.1366 | 485 | 1.5005 | - | - | - | - | - | | 2.1586 | 490 | 0.8618 | - | - | - | - | - | | 2.1806 | 495 | 2.1422 | - | - | - | - | - | | 2.2026 | 500 | 1.3922 | - | - | - | - | - | | 2.2247 | 505 | 1.5939 | - | - | - | - | - | | 2.2467 | 510 | 1.3021 | - | - | - | - | - | | 2.2687 | 515 | 1.0825 | - | - | - | - | - | | 2.2907 | 520 | 0.9066 | - | - | - | - | - | | 2.3128 | 525 | 0.7717 | - | - | - | - | - | | 2.3348 | 530 | 1.1484 | - | - | - | - | - | | 2.3568 | 535 | 1.6513 | - | - | - | - | - | | 2.3789 | 540 | 1.7267 | - | - | - | - | - | | 2.4009 | 545 | 0.7659 | - | - | - | - | - | | 2.4229 | 550 | 2.0213 | - | - | - | - | - | | 2.4449 | 555 | 0.5329 | - | - | - | - | - | | 2.4670 | 560 | 1.2083 | - | - | - | - | - | | 2.4890 | 565 | 1.5432 | - | - | - | - | - | | 2.5110 | 570 | 0.5423 | - | - | - | - | - | | 2.5330 | 575 | 0.2613 | - | - | - | - | - | | 2.5551 | 580 | 0.7985 | - | - | - | - | - | | 2.5771 | 585 | 0.3003 | - | - | - | - | - | | 2.5991 | 590 | 2.2234 | - | - | - | - | - | | 2.6211 | 595 | 0.4772 | - | - | - | - | - | | 2.6432 | 600 | 1.0158 | - | - | - | - | - | | 2.6652 | 605 | 2.6385 | - | - | - | - | - | | 2.6872 | 610 | 0.7042 | - | - | - | - | - | | 2.7093 | 615 | 1.1469 | - | - | - | - | - | | 2.7313 | 620 | 1.4092 | - | - | - | - | - | | 2.7533 | 625 | 0.6487 | - | - | - | - | - | | 2.7753 | 630 | 1.218 | - | - | - | - | - | | 2.7974 | 635 | 1.1509 | - | - | - | - | - | | 2.8194 | 640 | 1.1524 | - | - | - | - | - | | 2.8414 | 645 | 0.6477 | - | - | - | - | - | | 2.8634 | 650 | 0.6295 | - | - | - | - | - | | 2.8855 | 655 | 1.3026 | - | - | - | - | - | | 2.9075 | 660 | 1.9196 | - | - | - | - | - | | 2.9295 | 665 | 1.3743 | - | - | - | - | - | | 2.9515 | 670 | 0.8934 | - | - | - | - | - | | 2.9736 | 675 | 1.1801 | - | - | - | - | - | | 2.9956 | 680 | 1.2952 | - | - | - | - | - | | 3.0 | 681 | - | 0.9538 | 0.9513 | 0.9538 | 0.9414 | 0.9435 | | 3.0176 | 685 | 0.3324 | - | - | - | - | - | | 3.0396 | 690 | 0.9551 | - | - | - | - | - | | 3.0617 | 695 | 0.9315 | - | - | - | - | - | | 3.0837 | 700 | 1.3611 | - | - | - | - | - | | 3.1057 | 705 | 1.4406 | - | - | - | - | - | | 3.1278 | 710 | 0.5888 | - | - | - | - | - | | 3.1498 | 715 | 0.9149 | - | - | - | - | - | | 3.1718 | 720 | 0.5627 | - | - | - | - | - | | 3.1938 | 725 | 1.6876 | - | - | - | - | - | | 3.2159 | 730 | 1.1366 | - | - | - | - | - | | 3.2379 | 735 | 1.3571 | - | - | - | - | - | | 3.2599 | 740 | 1.5227 | - | - | - | - | - | | 3.2819 | 745 | 2.5139 | - | - | - | - | - | | 3.3040 | 750 | 0.3735 | - | - | - | - | - | | 3.3260 | 755 | 1.4386 | - | - | - | - | - | | 3.3480 | 760 | 0.3838 | - | - | - | - | - | | 3.3700 | 765 | 0.3973 | - | - | - | - | - | | 3.3921 | 770 | 1.4972 | - | - | - | - | - | | 3.4141 | 775 | 1.5118 | - | - | - | - | - | | 3.4361 | 780 | 0.478 | - | - | - | - | - | | 3.4581 | 785 | 1.5982 | - | - | - | - | - | | 3.4802 | 790 | 0.6209 | - | - | - | - | - | | 3.5022 | 795 | 0.5902 | - | - | - | - | - | | 3.5242 | 800 | 1.0877 | - | - | - | - | - | | 3.5463 | 805 | 0.9553 | - | - | - | - | - | | 3.5683 | 810 | 0.3054 | - | - | - | - | - | | 3.5903 | 815 | 1.2229 | - | - | - | - | - | | 3.6123 | 820 | 0.7434 | - | - | - | - | - | | 3.6344 | 825 | 1.5447 | - | - | - | - | - | | 3.6564 | 830 | 1.0751 | - | - | - | - | - | | 3.6784 | 835 | 0.8161 | - | - | - | - | - | | 3.7004 | 840 | 0.4382 | - | - | - | - | - | | 3.7225 | 845 | 1.3547 | - | - | - | - | - | | 3.7445 | 850 | 1.7112 | - | - | - | - | - | | 3.7665 | 855 | 0.5362 | - | - | - | - | - | | 3.7885 | 860 | 0.9309 | - | - | - | - | - | | 3.8106 | 865 | 1.8301 | - | - | - | - | - | | 3.8326 | 870 | 1.5554 | - | - | - | - | - | | 3.8546 | 875 | 1.4035 | - | - | - | - | - | | 3.8767 | 880 | 1.5814 | - | - | - | - | - | | 3.8987 | 885 | 0.7283 | - | - | - | - | - | | 3.9207 | 890 | 1.8549 | - | - | - | - | - | | 3.9427 | 895 | 0.196 | - | - | - | - | - | | 3.9648 | 900 | 1.2072 | - | - | - | - | - | | 3.9868 | 905 | 0.83 | - | - | - | - | - | | 4.0 | 908 | - | 0.9564 | 0.9587 | 0.9612 | 0.9488 | 0.9563 | | 4.0088 | 910 | 1.7222 | - | - | - | - | - | | 4.0308 | 915 | 0.6728 | - | - | - | - | - | | 4.0529 | 920 | 0.9388 | - | - | - | - | - | | 4.0749 | 925 | 0.7998 | - | - | - | - | - | | 4.0969 | 930 | 1.1561 | - | - | - | - | - | | 4.1189 | 935 | 2.4315 | - | - | - | - | - | | 4.1410 | 940 | 1.3263 | - | - | - | - | - | | 4.1630 | 945 | 1.2374 | - | - | - | - | - | | 4.1850 | 950 | 1.1307 | - | - | - | - | - | | 4.2070 | 955 | 0.5512 | - | - | - | - | - | | 4.2291 | 960 | 1.3266 | - | - | - | - | - | | 4.2511 | 965 | 1.2306 | - | - | - | - | - | | 4.2731 | 970 | 1.7083 | - | - | - | - | - | | 4.2952 | 975 | 0.7028 | - | - | - | - | - | | 4.3172 | 980 | 1.2987 | - | - | - | - | - | | 4.3392 | 985 | 1.545 | - | - | - | - | - | | 4.3612 | 990 | 1.004 | - | - | - | - | - | | 4.3833 | 995 | 0.8276 | - | - | - | - | - | | 4.4053 | 1000 | 1.4694 | - | - | - | - | - | | 4.4273 | 1005 | 0.4914 | - | - | - | - | - | | 4.4493 | 1010 | 0.9894 | - | - | - | - | - | | 4.4714 | 1015 | 0.8855 | - | - | - | - | - | | 4.4934 | 1020 | 1.1339 | - | - | - | - | - | | 4.5154 | 1025 | 1.0786 | - | - | - | - | - | | 4.5374 | 1030 | 1.2547 | - | - | - | - | - | | 4.5595 | 1035 | 0.5312 | - | - | - | - | - | | 4.5815 | 1040 | 1.4938 | - | - | - | - | - | | 4.6035 | 1045 | 0.8124 | - | - | - | - | - | | 4.6256 | 1050 | 1.2401 | - | - | - | - | - | | 4.6476 | 1055 | 1.1902 | - | - | - | - | - | | 4.6696 | 1060 | 1.4183 | - | - | - | - | - | | 4.6916 | 1065 | 1.0718 | - | - | - | - | - | | 4.7137 | 1070 | 1.2203 | - | - | - | - | - | | 4.7357 | 1075 | 0.8535 | - | - | - | - | - | | 4.7577 | 1080 | 1.2454 | - | - | - | - | - | | 4.7797 | 1085 | 0.4216 | - | - | - | - | - | | 4.8018 | 1090 | 0.8327 | - | - | - | - | - | | 4.8238 | 1095 | 1.2371 | - | - | - | - | - | | 4.8458 | 1100 | 1.0949 | - | - | - | - | - | | 4.8678 | 1105 | 1.2177 | - | - | - | - | - | | 4.8899 | 1110 | 0.6236 | - | - | - | - | - | | 4.9119 | 1115 | 0.646 | - | - | - | - | - | | 4.9339 | 1120 | 1.1822 | - | - | - | - | - | | 4.9559 | 1125 | 1.0471 | - | - | - | - | - | | 4.9780 | 1130 | 0.7626 | - | - | - | - | - | | **5.0** | **1135** | **0.9794** | **0.9564** | **0.9563** | **0.9616** | **0.9488** | **0.9587** | * The bold row denotes the saved checkpoint.
### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.0.1 - Transformers: 4.42.4 - PyTorch: 2.3.1+cu121 - Accelerate: 0.32.1 - Datasets: 2.21.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```