--- language: - en library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - generated base_model: microsoft/mpnet-base metrics: - pearson_cosine - spearman_cosine - pearson_manhattan - spearman_manhattan - pearson_euclidean - spearman_euclidean - pearson_dot - spearman_dot - pearson_max - spearman_max widget: - source_sentence: 'Really? No kidding! ' sentences: - yeah really no kidding - At the end of the fourth century was when baked goods flourished. - The campaigns seem to reach a new pool of contributors. - source_sentence: A sleeping man. sentences: - Two men are sleeping. - Someone is selling oranges - the family is young - source_sentence: a guy on a bike sentences: - A tall person on a bike - A man is on a frozen lake. - The women throw food at the kids - source_sentence: yeah really no kidding sentences: - oh uh-huh well no they wouldn't would they no - yeah i mean just when uh the they military paid for her education - The campaigns seem to reach a new pool of contributors. - source_sentence: He ran like an athlete. sentences: - ' Then he ran.' - yeah i mean just when uh the they military paid for her education - Similarly, OIM revised the electronic Grant Renewal Application to accommodate new information sought by LSC and to ensure greater ease for users. pipeline_tag: sentence-similarity co2_eq_emissions: emissions: 17.515467907816664 source: codecarbon training_type: fine-tuning on_cloud: false cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K ram_total_size: 31.777088165283203 hours_used: 0.13 hardware_used: 1 x NVIDIA GeForce RTX 3090 model-index: - name: SentenceTransformer based on microsoft/mpnet-base results: - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev type: sts-dev metrics: - type: pearson_cosine value: 0.7331234146933103 name: Pearson Cosine - type: spearman_cosine value: 0.7435439430716654 name: Spearman Cosine - type: pearson_manhattan value: 0.7389474504545281 name: Pearson Manhattan - type: spearman_manhattan value: 0.7473580293303098 name: Spearman Manhattan - type: pearson_euclidean value: 0.7356264396007131 name: Pearson Euclidean - type: spearman_euclidean value: 0.7436137284782617 name: Spearman Euclidean - type: pearson_dot value: 0.7093073700072118 name: Pearson Dot - type: spearman_dot value: 0.7150453113301433 name: Spearman Dot - type: pearson_max value: 0.7389474504545281 name: Pearson Max - type: spearman_max value: 0.7473580293303098 name: Spearman Max - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test type: sts-test metrics: - type: pearson_cosine value: 0.6750510843835755 name: Pearson Cosine - type: spearman_cosine value: 0.6615639695746663 name: Spearman Cosine - type: pearson_manhattan value: 0.6718085205234632 name: Pearson Manhattan - type: spearman_manhattan value: 0.6589482932175834 name: Spearman Manhattan - type: pearson_euclidean value: 0.6693170762111229 name: Pearson Euclidean - type: spearman_euclidean value: 0.6578210069410166 name: Spearman Euclidean - type: pearson_dot value: 0.6490291380804283 name: Pearson Dot - type: spearman_dot value: 0.6335192601696299 name: Spearman Dot - type: pearson_max value: 0.6750510843835755 name: Pearson Max - type: spearman_max value: 0.6615639695746663 name: Spearman Max --- # SentenceTransformer based on microsoft/mpnet-base This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli), [snli](https://huggingface.co/datasets/stanfordnlp/snli) and [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) - **Maximum Sequence Length:** 384 tokens - **Output Dimensionality:** 768 tokens - **Training Datasets:** - [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli) - [snli](https://huggingface.co/datasets/stanfordnlp/snli) - [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) - **Language:** en ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("tomaarsen/st-v3-test-mpnet-base-allnli-stsb") # Run inference sentences = [ "He ran like an athlete.", " Then he ran.", "yeah i mean just when uh the they military paid for her education", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] ``` ## Evaluation ### Metrics #### Semantic Similarity * Dataset: `sts-dev` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.7331 | | **spearman_cosine** | **0.7435** | | pearson_manhattan | 0.7389 | | spearman_manhattan | 0.7474 | | pearson_euclidean | 0.7356 | | spearman_euclidean | 0.7436 | | pearson_dot | 0.7093 | | spearman_dot | 0.715 | | pearson_max | 0.7389 | | spearman_max | 0.7474 | #### Semantic Similarity * Dataset: `sts-test` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.6751 | | **spearman_cosine** | **0.6616** | | pearson_manhattan | 0.6718 | | spearman_manhattan | 0.6589 | | pearson_euclidean | 0.6693 | | spearman_euclidean | 0.6578 | | pearson_dot | 0.649 | | spearman_dot | 0.6335 | | pearson_max | 0.6751 | | spearman_max | 0.6616 | ## Training Details ### Training Datasets #### multi_nli * Dataset: [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli) at [da70db2](https://huggingface.co/datasets/nyu-mll/multi_nli/tree/da70db2af9d09693783c3320c4249840212ee221) * Size: 10,000 training samples * Columns: premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | premise | hypothesis | label | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | premise | hypothesis | label | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------| | Conceptually cream skimming has two basic dimensions - product and geography. | Product and geography are what make cream skimming work. | 1 | | you know during the season and i guess at at your level uh you lose them to the next level if if they decide to recall the the parent team the Braves decide to call to recall a guy from triple A then a double A guy goes up to replace him and a single A guy goes up to replace him | You lose the things to the following level if the people recall. | 0 | | One of our number will carry out your instructions minutely. | A member of my team will execute your orders with immense precision. | 0 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### snli * Dataset: [snli](https://huggingface.co/datasets/stanfordnlp/snli) at [cdb5c3d](https://huggingface.co/datasets/stanfordnlp/snli/tree/cdb5c3d5eed6ead6e5a341c8e56e669bb666725b) * Size: 10,000 training samples * Columns: snli_premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | snli_premise | hypothesis | label | |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | snli_premise | hypothesis | label | |:--------------------------------------------------------------------|:---------------------------------------------------------------|:---------------| | A person on a horse jumps over a broken down airplane. | A person is training his horse for a competition. | 1 | | A person on a horse jumps over a broken down airplane. | A person is at a diner, ordering an omelette. | 2 | | A person on a horse jumps over a broken down airplane. | A person is outdoors, on a horse. | 0 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### stsb * Dataset: [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) at [8913289](https://huggingface.co/datasets/mteb/stsbenchmark-sts/tree/8913289635987208e6e7c72789e4be2fe94b6abd) * Size: 5,749 training samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details | | | | * Samples: | sentence1 | sentence2 | label | |:-----------------------------------------------------------|:----------------------------------------------------------------------|:------------------| | A plane is taking off. | An air plane is taking off. | 1.0 | | A man is playing a large flute. | A man is playing a flute. | 0.76 | | A man is spreading shreded cheese on a pizza. | A man is spreading shredded cheese on an uncooked pizza. | 0.76 | * Loss: [sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss](https://sbert.net/docs/package_reference/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Evaluation Datasets #### multi_nli * Dataset: [multi_nli](https://huggingface.co/datasets/nyu-mll/multi_nli) at [da70db2](https://huggingface.co/datasets/nyu-mll/multi_nli/tree/da70db2af9d09693783c3320c4249840212ee221) * Size: 100 evaluation samples * Columns: premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | premise | hypothesis | label | |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | premise | hypothesis | label | |:---------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:---------------| | The new rights are nice enough | Everyone really likes the newest benefits | 1 | | This site includes a list of all award winners and a searchable database of Government Executive articles. | The Government Executive articles housed on the website are not able to be searched. | 2 | | uh i don't know i i have mixed emotions about him uh sometimes i like him but at the same times i love to see somebody beat him | I like him for the most part, but would still enjoy seeing someone beat him. | 0 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### snli * Dataset: [snli](https://huggingface.co/datasets/stanfordnlp/snli) at [cdb5c3d](https://huggingface.co/datasets/stanfordnlp/snli/tree/cdb5c3d5eed6ead6e5a341c8e56e669bb666725b) * Size: 9,842 evaluation samples * Columns: snli_premise, hypothesis, and label * Approximate statistics based on the first 1000 samples: | | snli_premise | hypothesis | label | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------| | type | string | string | int | | details | | | | * Samples: | snli_premise | hypothesis | label | |:-------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:---------------| | Two women are embracing while holding to go packages. | The sisters are hugging goodbye while holding to go packages after just eating lunch. | 1 | | Two women are embracing while holding to go packages. | Two woman are holding packages. | 0 | | Two women are embracing while holding to go packages. | The men are fighting outside a deli. | 2 | * Loss: [sentence_transformers.losses.SoftmaxLoss.SoftmaxLoss](https://sbert.net/docs/package_reference/losses.html#softmaxloss) #### stsb * Dataset: [stsb](https://huggingface.co/datasets/mteb/stsbenchmark-sts) at [8913289](https://huggingface.co/datasets/mteb/stsbenchmark-sts/tree/8913289635987208e6e7c72789e4be2fe94b6abd) * Size: 1,500 evaluation samples * Columns: sentence1, sentence2, and label * Approximate statistics based on the first 1000 samples: | | sentence1 | sentence2 | label | |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------| | type | string | string | float | | details | | | | * Samples: | sentence1 | sentence2 | label | |:--------------------------------------------------|:------------------------------------------------------|:------------------| | A man with a hard hat is dancing. | A man wearing a hard hat is dancing. | 1.0 | | A young child is riding a horse. | A child is riding a horse. | 0.95 | | A man is feeding a mouse to a snake. | The man is feeding a mouse to the snake. | 1.0 | * Loss: [sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss](https://sbert.net/docs/package_reference/losses.html#cosinesimilarityloss) with these parameters: ```json { "loss_fct": "torch.nn.modules.loss.MSELoss" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - per_device_train_batch_size: 128 - per_device_eval_batch_size: 128 - learning_rate: 2e-05 - num_train_epochs: 1 - warmup_ratio: 0.1 - seed: 33 - bf16: True #### All Hyperparameters
Click to expand - overwrite_output_dir: False - do_predict: False - prediction_loss_only: False - per_device_train_batch_size: 128 - per_device_eval_batch_size: 128 - per_gpu_train_batch_size: None - per_gpu_eval_batch_size: None - gradient_accumulation_steps: 1 - eval_accumulation_steps: None - learning_rate: 2e-05 - weight_decay: 0.0 - adam_beta1: 0.9 - adam_beta2: 0.999 - adam_epsilon: 1e-08 - max_grad_norm: 1.0 - num_train_epochs: 1 - max_steps: -1 - lr_scheduler_type: linear - lr_scheduler_kwargs: {} - warmup_ratio: 0.1 - warmup_steps: 0 - log_level: passive - log_level_replica: warning - log_on_each_node: True - logging_nan_inf_filter: True - save_safetensors: True - save_on_each_node: False - save_only_model: False - no_cuda: False - use_cpu: False - use_mps_device: False - seed: 33 - data_seed: None - jit_mode_eval: False - use_ipex: False - bf16: True - fp16: False - fp16_opt_level: O1 - half_precision_backend: auto - bf16_full_eval: False - fp16_full_eval: False - tf32: None - local_rank: 0 - ddp_backend: None - tpu_num_cores: None - tpu_metrics_debug: False - debug: [] - dataloader_drop_last: False - dataloader_num_workers: 0 - dataloader_prefetch_factor: None - past_index: -1 - disable_tqdm: False - remove_unused_columns: True - label_names: None - load_best_model_at_end: False - ignore_data_skip: False - fsdp: [] - fsdp_min_num_params: 0 - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - fsdp_transformer_layer_cls_to_wrap: None - accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True} - deepspeed: None - label_smoothing_factor: 0.0 - optim: adamw_torch - optim_args: None - adafactor: False - group_by_length: False - length_column_name: length - ddp_find_unused_parameters: None - ddp_bucket_cap_mb: None - ddp_broadcast_buffers: None - dataloader_pin_memory: True - dataloader_persistent_workers: False - skip_memory_metrics: True - use_legacy_prediction_loop: False - push_to_hub: False - resume_from_checkpoint: None - hub_model_id: None - hub_strategy: every_save - hub_private_repo: False - hub_always_push: False - gradient_checkpointing: False - gradient_checkpointing_kwargs: None - include_inputs_for_metrics: False - fp16_backend: auto - push_to_hub_model_id: None - push_to_hub_organization: None - mp_parameters: - auto_find_batch_size: False - full_determinism: False - torchdynamo: None - ray_scope: last - ddp_timeout: 1800 - torch_compile: False - torch_compile_backend: None - torch_compile_mode: None - dispatch_batches: None - split_batches: None - include_tokens_per_second: False - include_num_input_tokens_seen: False - neftune_noise_alpha: None - optim_target_modules: None - round_robin_sampler: False
### Training Logs | Epoch | Step | Training Loss | multi nli loss | snli loss | stsb loss | sts-dev spearman cosine | |:------:|:----:|:-------------:|:--------------:|:---------:|:---------:|:-----------------------:| | 0.0493 | 10 | 0.9199 | 1.1019 | 1.1017 | 0.3016 | 0.6324 | | 0.0985 | 20 | 1.0063 | 1.1000 | 1.0966 | 0.2635 | 0.6093 | | 0.1478 | 30 | 1.002 | 1.0995 | 1.0908 | 0.1766 | 0.5328 | | 0.1970 | 40 | 0.7946 | 1.0980 | 1.0913 | 0.0923 | 0.5991 | | 0.2463 | 50 | 0.9891 | 1.0967 | 1.0781 | 0.0912 | 0.6457 | | 0.2956 | 60 | 0.784 | 1.0938 | 1.0699 | 0.0934 | 0.6629 | | 0.3448 | 70 | 0.6735 | 1.0940 | 1.0728 | 0.0640 | 0.7538 | | 0.3941 | 80 | 0.7713 | 1.0893 | 1.0676 | 0.0612 | 0.7653 | | 0.4433 | 90 | 0.9772 | 1.0870 | 1.0573 | 0.0636 | 0.7621 | | 0.4926 | 100 | 0.8613 | 1.0862 | 1.0515 | 0.0632 | 0.7583 | | 0.5419 | 110 | 0.7528 | 1.0814 | 1.0397 | 0.0617 | 0.7536 | | 0.5911 | 120 | 0.6541 | 1.0854 | 1.0329 | 0.0657 | 0.7512 | | 0.6404 | 130 | 1.051 | 1.0658 | 1.0211 | 0.0607 | 0.7340 | | 0.6897 | 140 | 0.8516 | 1.0631 | 1.0171 | 0.0587 | 0.7467 | | 0.7389 | 150 | 0.7484 | 1.0563 | 1.0122 | 0.0556 | 0.7537 | | 0.7882 | 160 | 0.7368 | 1.0534 | 1.0100 | 0.0588 | 0.7526 | | 0.8374 | 170 | 0.8373 | 1.0498 | 1.0030 | 0.0565 | 0.7491 | | 0.8867 | 180 | 0.9311 | 1.0387 | 0.9981 | 0.0588 | 0.7302 | | 0.9360 | 190 | 0.5445 | 1.0357 | 0.9967 | 0.0565 | 0.7382 | | 0.9852 | 200 | 0.9154 | 1.0359 | 0.9964 | 0.0556 | 0.7435 | ### Environmental Impact Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon). - **Carbon Emitted**: 0.018 kg of CO2 - **Hours Used**: 0.13 hours ### Training Hardware - **On Cloud**: No - **GPU Model**: 1 x NVIDIA GeForce RTX 3090 - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K - **RAM Size**: 31.78 GB ### Framework Versions - Python: 3.11.6 - Sentence Transformers: 2.7.0.dev0 - Transformers: 4.39.3 - PyTorch: 2.1.0+cu121 - Accelerate: 0.26.1 - Datasets: 2.18.0 - Tokenizers: 0.15.2 ## Citation ### BibTeX #### Sentence Transformers and SoftmaxLoss ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```