--- language: [] library_name: sentence-transformers tags: - sentence-transformers - sentence-similarity - feature-extraction - dataset_size:1K - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NomicBertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ 'primes', 'Newton', 'big boys sneakers', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Triplet * Dataset: `esci-dev` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:-------------------|:-----------| | cosine_accuracy | 0.6414 | | dot_accuracy | 0.3664 | | manhattan_accuracy | 0.6404 | | euclidean_accuracy | 0.6407 | | **max_accuracy** | **0.6414** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 9,090 training samples * Columns: query, pos, and neg * Approximate statistics based on the first 1000 samples: | | query | pos | neg | |:--------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | query | pos | neg | |:--------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 1 3/4 inch tooled belt strap without belt buckle | BS3501 Solid Brass Leaf Belt Buckle Fits 1-3/4"(45mm) Wide Belt | Nocona Men's Hired Brown Floral Eagle, 40 | | 7edge phone case peacock | Galaxy S7 Edge Case for Girls Women Clear with Flowers Design Shockproof Protective Cell Phone Cases for Samsung Galaxy S7 Edge 5.5 Inch Cute Floral Pattern Print Flexible Slim Fit Bumper Rubber Cover | Galaxy S7 Case, Galaxy S7 Phone Case with HD Screen Protector for Girls Women, Gritup Cute Clear Gradient Glitter Liquid TPU Slim Phone Case for Samsung Galaxy S7 Teal/Purple | | girls white shoes | adidas Women's Coast Star Shoes, ftwr White/Silver Met./ core Black, 6 M US | Converse Optical White M7650 - HI TOP Size 6 M US Women / 4 M US Men | * Loss: [CachedGISTEmbedLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters: ```json {'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.01} ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 3,985 evaluation samples * Columns: query, pos, and neg * Approximate statistics based on the first 1000 samples: | | query | pos | neg | |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | query | pos | neg | |:--------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | colors for dining room | AOOS CUSTOM Dimmable LED Neon Signs for Home Bedroom Salon Dining Room Wall Decor (Customization: Texts, Designs, Logos, Languages, Colors, Sizes, Fonts, Color-Changing) (24" / 1 Line Text) | Jetec 5 Pieces EAT Sign Kitchen Wood Rustic Sign Arrow Wall Decor EAT Farmhouse Decoration Hanging Arrow Wooden Sign for Kitchen Wall Home Dining Room (Charming Color) | | mix no 6 heels for women | DREAM PAIRS Women's Hi-Chunk Gold Glitter High Heel Pump Sandals - 6 M US | Fashare Womens High Heels Pointed Toe Bowtie Back Ankle Buckle Strap Wedding Evening Party Dress Pumps Shoes | | goxlrmini | Singing Machine SMM-205 Unidirectional Dynamic Microphone with 10 Ft. Cord,Black, one size | Behringer U-Phoria Studio Pro Complete Recording Bundle with UMC202HD USB Audio Interface - With 20' 6mm Rubber XLR Microphone Cable, On-Stage MBS5000 Broadcast/Webcast Boom Arm with XLR Cable | * Loss: [CachedGISTEmbedLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) with these parameters: ```json {'guide': SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ), 'temperature': 0.01} ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `num_train_epochs`: 10 - `warmup_ratio`: 0.1 - `fp16`: True - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 10 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | esci-dev_max_accuracy | |:------:|:----:|:-------------:|:---------------------:| | 0 | 0 | - | 0.6414 | | 0.1757 | 100 | 0.8875 | - | | 0.3515 | 200 | 0.5281 | - | | 0.5272 | 300 | 0.4621 | - | | 0.7030 | 400 | 0.4669 | - | | 0.8787 | 500 | 0.4501 | - | | 1.0545 | 600 | 0.5379 | - | | 1.2302 | 700 | 0.4288 | - | | 1.4060 | 800 | 0.2112 | - | | 1.5817 | 900 | 0.1508 | - | | 1.7575 | 1000 | 0.1133 | - | | 1.9332 | 1100 | 0.1312 | - | | 2.1090 | 1200 | 0.0784 | - | | 2.2847 | 1300 | 0.0983 | - | | 2.4605 | 1400 | 0.106 | - | | 2.6362 | 1500 | 0.1058 | - | | 2.8120 | 1600 | 0.0673 | - | | 2.9877 | 1700 | 0.0355 | - | | 3.1634 | 1800 | 0.0175 | - | | 3.3392 | 1900 | 0.0366 | - | | 3.5149 | 2000 | 0.0332 | - | | 3.6907 | 2100 | 0.0682 | - | | 3.8664 | 2200 | 0.0378 | - | | 4.0422 | 2300 | 0.0239 | - | | 4.2179 | 2400 | 0.0282 | - | | 4.3937 | 2500 | 0.0401 | - | | 4.5694 | 2600 | 0.0268 | - | | 4.7452 | 2700 | 0.0208 | - | | 4.9209 | 2800 | 0.0117 | - | | 5.0967 | 2900 | 0.0045 | - | | 5.2724 | 3000 | 0.0145 | - | | 5.4482 | 3100 | 0.029 | - | | 5.6239 | 3200 | 0.0009 | - | | 5.7996 | 3300 | 0.0033 | - | | 5.9754 | 3400 | 0.0088 | - | | 6.1511 | 3500 | 0.0014 | - | | 6.3269 | 3600 | 0.0027 | - | | 6.5026 | 3700 | 0.0021 | - | | 6.6784 | 3800 | 0.0001 | - | | 6.8541 | 3900 | 0.0025 | - | | 7.0299 | 4000 | 0.0059 | - | | 7.2056 | 4100 | 0.0025 | - | | 7.3814 | 4200 | 0.0029 | - | | 7.5571 | 4300 | 0.0007 | - | | 7.7329 | 4400 | 0.0018 | - | | 7.9086 | 4500 | 0.0032 | - | | 8.0844 | 4600 | 0.0007 | - | | 8.2601 | 4700 | 0.0027 | - | | 8.4359 | 4800 | 0.0027 | - | | 8.6116 | 4900 | 0.0 | - | | 8.7873 | 5000 | 0.0025 | - | | 8.9631 | 5100 | 0.0025 | - | | 9.1388 | 5200 | 0.0014 | - | | 9.3146 | 5300 | 0.0027 | - | | 9.4903 | 5400 | 0.0021 | - | | 9.6661 | 5500 | 0.0 | - | | 9.8418 | 5600 | 0.0025 | - | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.0.0 - Transformers: 4.38.2 - PyTorch: 2.1.2+cu121 - Accelerate: 0.27.2 - Datasets: 2.19.1 - Tokenizers: 0.15.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```