pritamdeka's picture
Add new SentenceTransformer model.
1f2c124 verified
metadata
base_model: l3cube-pune/assamese-bert
datasets:
  - sentence-transformers/all-nli
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:557850
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: A man is jumping unto his filthy bed.
    sentences:
      - A young male is looking at a newspaper while 2 females walks past him.
      - The bed is dirty.
      - The man is on the moon.
  - source_sentence: >-
      A carefully balanced male stands on one foot near a clean ocean beach
      area.
    sentences:
      - A man is ouside near the beach.
      - Three policemen patrol the streets on bikes
      - A man is sitting on his couch.
  - source_sentence: The man is wearing a blue shirt.
    sentences:
      - Near the trashcan the man stood and smoked
      - >-
        A man in a blue shirt leans on a wall beside a road with a blue van and
        red car with water in the background.
      - A man in a black shirt is playing a guitar.
  - source_sentence: The girls are outdoors.
    sentences:
      - Two girls riding on an amusement part ride.
      - a guy laughs while doing laundry
      - >-
        Three girls are standing together in a room, one is listening, one is
        writing on a wall and the third is talking to them.
  - source_sentence: >-
      A construction worker peeking out of a manhole while his coworker sits on
      the sidewalk smiling.
    sentences:
      - A worker is looking out of a manhole.
      - A man is giving a presentation.
      - The workers are both inside the manhole.
model-index:
  - name: SentenceTransformer based on l3cube-pune/assamese-bert
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8448431188558219
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.848270397607023
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8429962459024234
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8461225961159852
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8450811877325317
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8481702238714027
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7600437454974306
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7604490741243843
            name: Spearman Dot
          - type: pearson_max
            value: 0.8450811877325317
            name: Pearson Max
          - type: spearman_max
            value: 0.848270397607023
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8160018744466311
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8230016183156494
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8104201802445242
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8104000391884387
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8108715587588242
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8112881633291651
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.7088828153549986
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6991542788989243
            name: Spearman Dot
          - type: pearson_max
            value: 0.8160018744466311
            name: Pearson Max
          - type: spearman_max
            value: 0.8230016183156494
            name: Spearman Max

SentenceTransformer based on l3cube-pune/assamese-bert

This is a sentence-transformers model finetuned from l3cube-pune/assamese-bert on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("pritamdeka/assamese-bert-nli-v2")
# Run inference
sentences = [
    'A construction worker peeking out of a manhole while his coworker sits on the sidewalk smiling.',
    'A worker is looking out of a manhole.',
    'The workers are both inside the manhole.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8448
spearman_cosine 0.8483
pearson_manhattan 0.843
spearman_manhattan 0.8461
pearson_euclidean 0.8451
spearman_euclidean 0.8482
pearson_dot 0.76
spearman_dot 0.7604
pearson_max 0.8451
spearman_max 0.8483

Semantic Similarity

Metric Value
pearson_cosine 0.816
spearman_cosine 0.823
pearson_manhattan 0.8104
spearman_manhattan 0.8104
pearson_euclidean 0.8109
spearman_euclidean 0.8113
pearson_dot 0.7089
spearman_dot 0.6992
pearson_max 0.816
spearman_max 0.823

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.55 tokens
    • max: 48 tokens
    • min: 6 tokens
    • mean: 13.08 tokens
    • max: 40 tokens
    • min: 5 tokens
    • mean: 13.7 tokens
    • max: 53 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.54 tokens
    • max: 74 tokens
    • min: 4 tokens
    • mean: 9.97 tokens
    • max: 30 tokens
    • min: 5 tokens
    • mean: 10.59 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - - 0.6401 -
0.0574 500 2.5567 1.2774 0.7654 -
0.1147 1000 1.3874 1.0303 0.7997 -
0.1721 1500 1.1493 0.9597 0.7867 -
0.2294 2000 0.9885 0.7656 0.7895 -
0.2868 2500 0.9588 0.8041 0.7797 -
0.3442 3000 0.922 0.7280 0.7785 -
0.4015 3500 0.8693 0.6803 0.7925 -
0.4589 4000 0.8436 0.6892 0.7866 -
0.5162 4500 0.8033 0.7127 0.7818 -
0.5736 5000 0.8061 0.6854 0.7746 -
0.6310 5500 0.8069 0.6496 0.7856 -
0.6883 6000 0.8133 0.6490 0.7787 -
0.7457 6500 0.7857 0.5926 0.8010 -
0.8030 7000 0.4404 0.4472 0.8457 -
0.8604 7500 0.3422 0.4441 0.8473 -
0.9177 8000 0.308 0.4315 0.8494 -
0.9751 8500 0.299 0.4305 0.8483 -
1.0 8717 - - - 0.8230

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}