gsm-finetunned / README.md
anomys's picture
Add new SentenceTransformer model.
617a9f2 verified
metadata
base_model: sentence-transformers/all-mpnet-base-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:2400
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: Are there any furniture stores? (variation 536)
    sentences:
      - >-
        Event tickets can be purchased at the customer service desk or online
        through the mall's website.
      - The Apple Store is located on the second floor near the food court.
      - >-
        Yes, there are furniture stores including IKEA and Ashley Furniture,
        both located on the second floor.
  - source_sentence: Is there a play area for kids? (variation 121)
    sentences:
      - >-
        The customer service desk is located on the ground floor near the main
        entrance.
      - >-
        Yes, there is a play area for kids on the first floor near the west
        entrance.
      - >-
        Yes, there is a luggage store on the second floor near the central
        atrium.
  - source_sentence: Are there any sports stores? (variation 931)
    sentences:
      - Yes, there is a toy store on the first floor near the west entrance.
      - >-
        Event tickets can be purchased at the customer service desk or online
        through the mall's website.
      - >-
        Yes, there are sports stores including Nike and Adidas, both located on
        the first floor.
  - source_sentence: Where can I charge my phone? (variation 904)
    sentences:
      - >-
        Yes, reservations for 'The Gourmet Palace' can be made by calling their
        direct line or via their website.
      - >-
        Yes, there is a photography studio on the first floor near the main
        entrance.
      - >-
        Phone charging stations are available throughout the mall, including
        near the food court and at the customer service desk.
  - source_sentence: Does the mall have a post office? (variation 1412)
    sentences:
      - >-
        Yes, there is a photography studio on the first floor near the main
        entrance.
      - Yes, there is a game arcade on the third floor next to the cinema.
      - Yes, there is a post office on the ground floor near the west entrance.

SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • train

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("anomys/gsm-finetunned")
# Run inference
sentences = [
    'Does the mall have a post office? (variation 1412)',
    'Yes, there is a post office on the ground floor near the west entrance.',
    'Yes, there is a game arcade on the third floor next to the cinema.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 2,400 training samples
  • Columns: question and response
  • Approximate statistics based on the first 1000 samples:
    question response
    type string string
    details
    • min: 12 tokens
    • mean: 15.28 tokens
    • max: 21 tokens
    • min: 16 tokens
    • mean: 21.73 tokens
    • max: 33 tokens
  • Samples:
    question response
    Where can I find an ATM in the mall? (variation 643) ATMs are located on the ground floor next to the information desk and near the west entrance.
    Is there a map of the mall available? (variation 701) Yes, you can find interactive maps on our website and physical maps at the information desks located at each entrance.
    Where can I find the customer service desk? (variation 227) The customer service desk is located on the ground floor near the main entrance.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

train

  • Dataset: train
  • Size: 600 evaluation samples
  • Columns: question and response
  • Approximate statistics based on the first 1000 samples:
    question response
    type string string
    details
    • min: 12 tokens
    • mean: 15.22 tokens
    • max: 21 tokens
    • min: 16 tokens
    • mean: 21.35 tokens
    • max: 33 tokens
  • Samples:
    question response
    Are there any opticians in the mall? (variation 1802) Yes, there are opticians including LensCrafters and Visionworks, both located on the first floor.
    Is there a map of the mall available? (variation 1191) Yes, you can find interactive maps on our website and physical maps at the information desks located at each entrance.
    Are there any wheelchair-accessible entrances? (variation 1818) Yes, all main entrances are wheelchair accessible, and we provide complimentary wheelchair rentals at the customer service desk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss train loss
0.3333 50 0.0083 0.0000
0.6667 100 0.0 0.0000
1.0 150 0.0 0.0000

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}