SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jinoooooooooo/bert-base-cased-nli-tsdae")
# Run inference
sentences = [
    'A finds humorous that.',
    'A older gentleman finds it humorous that he is getting his picture taken while doing his laundry.',
    'A woman walks on a sidewalk wearing a white dress with a blue plaid pattern.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 557,850 training samples
  • Columns: damaged and original
  • Approximate statistics based on the first 1000 samples:
    damaged original
    type string string
    details
    • min: 3 tokens
    • mean: 5.45 tokens
    • max: 22 tokens
    • min: 7 tokens
    • mean: 10.49 tokens
    • max: 46 tokens
  • Samples:
    damaged original
    a horse jumps a A person on a horse jumps over a broken down airplane.
    at Children smiling and waving at camera
    boy jumping a. A boy is jumping on skateboard in the middle of a red bridge.
  • Loss: DenoisingAutoEncoderLoss

Evaluation Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: damaged and original
  • Approximate statistics based on the first 1000 samples:
    damaged original
    type string string
    details
    • min: 3 tokens
    • mean: 8.52 tokens
    • max: 32 tokens
    • min: 6 tokens
    • mean: 18.26 tokens
    • max: 69 tokens
  • Samples:
    damaged original
    Two while packages. Two women are embracing while holding to go packages.
    young children, with the number one with 2 are standing wooden in a bathroom in sink. Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.
    A a during world city of A man selling donuts to a customer during a world exhibition event held in the city of Angeles
  • Loss: DenoisingAutoEncoderLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.016 100 7.3226 7.2198
0.032 200 3.7141 6.3506
0.048 300 3.0632 5.8854
0.064 400 2.6549 5.7539
0.08 500 2.5332 5.5007
0.096 600 2.3137 5.5201
0.112 700 2.2533 5.3476
0.128 800 2.0654 5.3438
0.144 900 1.9943 5.3552
0.16 1000 1.9587 5.2709
0.176 1100 1.8053 5.4117
0.192 1200 1.7414 5.4315
0.208 1300 1.6773 5.2983
0.224 1400 1.6035 5.5064
0.24 1500 1.5592 5.5167
0.256 1600 1.5837 5.4428
0.272 1700 1.469 5.5266
0.288 1800 1.384 5.5159
0.304 1900 1.3616 5.4305
0.32 2000 1.3065 5.5076
0.336 2100 1.3045 5.5460
0.352 2200 1.3447 5.3051
0.368 2300 1.3367 5.4867
0.384 2400 1.148 5.6086
0.4 2500 1.2229 5.5027
0.416 2600 1.16 5.4446
0.432 2700 1.1809 5.4059
0.448 2800 1.2099 5.6255
0.464 2900 1.1264 5.2683
0.48 3000 1.1589 5.3651
0.496 3100 1.0954 5.3109
0.512 3200 1.0962 5.4071
0.528 3300 1.1185 5.4022
0.544 3400 1.0656 5.2648
0.56 3500 1.0935 5.2185
0.576 3600 1.0235 5.2950
0.592 3700 1.0256 5.3534
0.608 3800 0.9711 5.2015
0.624 3900 0.9901 5.1011
0.64 4000 0.9959 5.2055
0.656 4100 1.0018 5.2456
0.672 4200 0.9836 5.3166
0.688 4300 1.0481 5.2324
0.704 4400 0.9917 5.1831
0.72 4500 0.9595 5.1268
0.736 4600 1.0096 5.1112
0.752 4700 0.9986 5.0724
0.768 4800 0.9405 5.1163
0.784 4900 0.9057 5.0673
0.8 5000 0.9938 4.9926
0.816 5100 0.9849 4.9733
0.832 5200 0.8973 5.0531
0.848 5300 0.924 5.0007
0.864 5400 0.9516 5.0079
0.88 5500 0.9637 4.9513
0.896 5600 0.9232 5.0035
0.912 5700 0.9518 4.9339
0.928 5800 0.8939 4.9783
0.944 5900 0.8752 4.9495
0.96 6000 0.9187 4.9496
0.976 6100 0.8987 4.9177
0.992 6200 0.9034 4.9224

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.4.0.dev0
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.1.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

DenoisingAutoEncoderLoss

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}
Downloads last month
5
Safetensors
Model size
108M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jinoooooooooo/bert-base-cased-nli-tsdae

Finetuned
(2003)
this model

Dataset used to train jinoooooooooo/bert-base-cased-nli-tsdae