SentenceTransformer based on sentence-transformers/distilbert-base-nli-mean-tokens
This is a sentence-transformers model finetuned from sentence-transformers/distilbert-base-nli-mean-tokens. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/distilbert-base-nli-mean-tokens
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("DivyaMereddy007/RecipeBert_v5original_epoc50Copy_of_TrainSetenceTransforme-Finetuning_v5_DistilledBert")
# Run inference
sentences = [
'Watermelon Rind Pickles ["7 lb. watermelon rind", "7 c. sugar", "2 c. apple vinegar", "1/2 tsp. oil of cloves", "1/2 tsp. oil of cinnamon"] ["Trim off green and pink parts of watermelon rind; cut to 1-inch cubes.", "Parboil until tender, but not soft.", "Drain. Combine sugar, vinegar, oil of cloves and oil of cinnamon; bring to boiling and pour over rind.", "Let stand overnight.", "In the morning, drain off syrup.", "Heat and put over rind.", "The third morning, heat rind and syrup; seal in hot, sterilized jars.", "Makes 8 pints.", "(Oil of cinnamon and clove keeps rind clear and transparent.)"]',
'Summer Chicken ["1 pkg. chicken cutlets", "1/2 c. oil", "1/3 c. red vinegar", "2 Tbsp. oregano", "2 Tbsp. garlic salt"] ["Double recipe for more chicken."]',
'Summer Spaghetti ["1 lb. very thin spaghetti", "1/2 bottle McCormick Salad Supreme (seasoning)", "1 bottle Zesty Italian dressing"] ["Prepare spaghetti per package.", "Drain.", "Melt a little butter through it.", "Marinate overnight in Salad Supreme and Zesty Italian dressing.", "Just before serving, add cucumbers, tomatoes, green peppers, mushrooms, olives or whatever your taste may want."]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 1,746 training samples
- Columns:
sentence_0
,sentence_1
, andlabel
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 label type string string float details - min: 63 tokens
- mean: 118.82 tokens
- max: 128 tokens
- min: 63 tokens
- mean: 118.59 tokens
- max: 128 tokens
- min: 0.0
- mean: 0.19
- max: 1.0
- Samples:
sentence_0 sentence_1 label Tuna Macaroni Casserole ["1 box macaroni and cheese", "1 can tuna, drained", "1 small jar pimentos", "1 medium onion, chopped"] ["Prepare macaroni and cheese as directed.", "Add drained tuna, pimento and onion.", "Mix.", "Serve hot or cold."]
Easy Fudge ["1 (14 oz.) can sweetened condensed milk", "1 (12 oz.) pkg. semi-sweet chocolate chips", "1 (1 oz.) sq. unsweetened chocolate (if desired)", "1 1/2 c. chopped nuts (if desired)", "1 tsp. vanilla"] ["Butter a square pan, 8 x 8 x 2-inches.", "Heat milk, chocolate chips and unsweetened chocolate over low heat, stirring constantly, until chocolate is melted and mixture is smooth. Remove from heat.", "Stir in nuts and vanilla.", "Spread in pan."]
0.05
Scalloped Corn ["1 can cream-style corn", "1 can whole kernel corn", "1/2 pkg. (approximately 20) saltine crackers, crushed", "1 egg, beaten", "6 tsp. butter, divided", "pepper to taste"] ["Mix together both cans of corn, crackers, egg, 2 teaspoons of melted butter and pepper and place in a buttered baking dish.", "Dot with remaining 4 teaspoons of butter.", "Bake at 350\u00b0 for 1 hour."]
Quick Peppermint Puffs ["8 marshmallows", "2 Tbsp. margarine, melted", "1/4 c. crushed peppermint candy", "1 can crescent rolls"] ["Dip marshmallows in melted margarine; roll in candy. Wrap a crescent triangle around each marshmallow, completely covering the marshmallow and square edges of dough tightly to seal.", "Dip in margarine and place in a greased muffin tin.", "Bake at 375\u00b0 for 10 to 15 minutes; remove from pan."]
0.1
Beer Bread ["3 c. self rising flour", "1 - 12 oz. can beer", "1 Tbsp. sugar"] ["Stir the ingredients together and put in a greased and floured loaf pan.", "Bake at 425 degrees for 50 minutes.", "Drizzle melted butter on top."]
Rhubarb Coffee Cake ["1 1/2 c. sugar", "1/2 c. butter", "1 egg", "1 c. buttermilk", "2 c. flour", "1/2 tsp. salt", "1 tsp. soda", "1 c. buttermilk", "2 c. rhubarb, finely cut", "1 tsp. vanilla"] ["Cream sugar and butter.", "Add egg and beat well.", "To creamed butter, sugar and egg, add alternately buttermilk with mixture of flour, salt and soda.", "Mix well.", "Add rhubarb and vanilla.", "Pour into greased 9 x 13-inch pan and add Topping."]
0.4
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size
: 16per_device_eval_batch_size
: 16num_train_epochs
: 50multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 16per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 50max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | Training Loss |
---|---|---|
4.5455 | 500 | 0.0092 |
9.0909 | 1000 | 0.0091 |
13.6364 | 1500 | 0.0081 |
18.1818 | 2000 | 0.0074 |
22.7273 | 2500 | 0.0071 |
27.2727 | 3000 | 0.0069 |
31.8182 | 3500 | 0.0066 |
36.3636 | 4000 | 0.0065 |
40.9091 | 4500 | 0.0061 |
45.4545 | 5000 | 0.006 |
50.0 | 5500 | 0.0056 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.3.0+cu121
- Accelerate: 0.31.0
- Datasets: 2.19.2
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 5
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.