threehook's picture
Add new SentenceTransformer model.
3097d45 verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:1K<n<10K
  - loss:CosineSimilarityLoss
base_model: GroNLP/bert-base-dutch-cased
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
widget:
  - source_sentence: A woman is dancing.
    sentences:
      - Two little girls in pink are dancing.
      - A green bus drives down a road.
      - House with a red door.
  - source_sentence: A man jumping rope
    sentences:
      - The man without a shirt is jumping.
      - Three people are walking a dog.
      - A woman is taking a picture.
  - source_sentence: A man is spitting.
    sentences:
      - A man is cutting paper.
      - A person is combing a cat hair.
      - A small baby is playing a guitar.
  - source_sentence: A plane in the sky.
    sentences:
      - Two airplanes in the sky.
      - Breivik complains of 'ridicule'
      - Three men are playing guitars.
  - source_sentence: A plane is landing.
    sentences:
      - A animated airplane is landing.
      - A woman is applying eye shadow.
      - Kenya SC upholds election result
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on GroNLP/bert-base-dutch-cased
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.7663399220180294
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7663585273609937
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7553858135652205
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7596894750291403
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.756020111318255
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7600411026249633
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.724987833867276
            name: Pearson Dot
          - type: spearman_dot
            value: 0.7281058086742583
            name: Spearman Dot
          - type: pearson_max
            value: 0.7663399220180294
            name: Pearson Max
          - type: spearman_max
            value: 0.7663585273609937
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.7338660712008959
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.7216786912799816
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7191458672763532
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7089185758914616
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7203342460991101
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7104087588860777
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6972437145317183
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6881333441399748
            name: Spearman Dot
          - type: pearson_max
            value: 0.7338660712008959
            name: Pearson Max
          - type: spearman_max
            value: 0.7216786912799816
            name: Spearman Max

SentenceTransformer based on GroNLP/bert-base-dutch-cased

This is a sentence-transformers model finetuned from GroNLP/bert-base-dutch-cased on the sentence-transformers/stsb dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("threehook/bert-base-dutch-cased-sts")
# Run inference
sentences = [
    'A plane is landing.',
    'A animated airplane is landing.',
    'A woman is applying eye shadow.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7663
spearman_cosine 0.7664
pearson_manhattan 0.7554
spearman_manhattan 0.7597
pearson_euclidean 0.756
spearman_euclidean 0.76
pearson_dot 0.725
spearman_dot 0.7281
pearson_max 0.7663
spearman_max 0.7664

Semantic Similarity

Metric Value
pearson_cosine 0.7339
spearman_cosine 0.7217
pearson_manhattan 0.7191
spearman_manhattan 0.7089
pearson_euclidean 0.7203
spearman_euclidean 0.7104
pearson_dot 0.6972
spearman_dot 0.6881
pearson_max 0.7339
spearman_max 0.7217

Training Details

Training Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 7 tokens
    • mean: 14.59 tokens
    • max: 44 tokens
    • min: 7 tokens
    • mean: 14.52 tokens
    • max: 40 tokens
    • min: 0.0
    • mean: 0.54
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

sentence-transformers/stsb

  • Dataset: sentence-transformers/stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 7 tokens
    • mean: 23.91 tokens
    • max: 64 tokens
    • min: 8 tokens
    • mean: 23.78 tokens
    • max: 62 tokens
    • min: 0.0
    • mean: 0.47
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0.2778 100 0.0782 0.0517 0.7096 -
0.5556 200 0.0509 0.0539 0.7066 -
0.8333 300 0.0458 0.0417 0.7457 -
1.1111 400 0.0357 0.0385 0.7650 -
1.3889 500 0.0254 0.0424 0.7532 -
1.6667 600 0.0239 0.0394 0.7561 -
1.9444 700 0.0235 0.0407 0.7561 -
2.2222 800 0.0147 0.0395 0.7616 -
2.5 900 0.011 0.0391 0.7647 -
2.7778 1000 0.0111 0.0396 0.7634 -
3.0556 1100 0.0108 0.0388 0.7653 -
3.3333 1200 0.007 0.0392 0.7684 -
3.6111 1300 0.0073 0.0391 0.7673 -
3.8889 1400 0.007 0.0388 0.7664 -
4.0 1440 - - - 0.7217

Framework Versions

  • Python: 3.9.19
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}