SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the skill_sentence and skill_skill datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 96 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Datasets:
    • skill_sentence
    • skill_skill

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 96, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): SmartTokenPooling({'word_embedding_dimension': 768, 'window_size': -1})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("jensjorisdecorte/ConTeXT-Skill-Extraction-base")
# Run inference
sentences = [
    'Must have the ability to read and interpret schematics and effectively install and calibrate lift governors to ensure compliance with safety standards. The ideal candidate must have an ear for identifying music with commercial potential and understand the current market trends.',
    'install lift governor',
    'skill_sentence',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Datasets

skill_sentence

  • Dataset: skill_sentence
  • Size: 138,260 training samples
  • Columns: anchor, positive, and type
  • Approximate statistics based on the first 1000 samples:
    anchor positive type
    type string string string
    details
    • min: 9 tokens
    • mean: 35.67 tokens
    • max: 63 tokens
    • min: 3 tokens
    • mean: 6.12 tokens
    • max: 15 tokens
    • min: 5 tokens
    • mean: 5.0 tokens
    • max: 5 tokens
  • Samples:
    anchor positive type
    duties for this role will include conducting water chemistry analysis and managing the laboratory. seeking a seasoned print manufacturing manager with knowledge of printing materials, processes and equipment. water chemistry analysis skill_sentence
    divers must understand how to calculate dive times and limits to ensure they return safely. We are searching for a multimedia software expert with experience in sound, lighting and recording software. comply with the planned time for the depth of the dive skill_sentence
    A successful candidate will possess the ability to calibrate laboratory equipment according to industry standards. we are seeking a candidate with experience in preparing government funding dossiers prepare government funding dossiers skill_sentence
  • Loss: custom_losses.HardMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20,
        "similarity_fct": "<lambda>"
    }
    

skill_skill

  • Dataset: skill_skill
  • Size: 13,891 training samples
  • Columns: anchor, positive, and type
  • Approximate statistics based on the first 1000 samples:
    anchor positive type
    type string string string
    details
    • min: 6 tokens
    • mean: 29.09 tokens
    • max: 96 tokens
    • min: 3 tokens
    • mean: 6.24 tokens
    • max: 16 tokens
    • min: 5 tokens
    • mean: 5.0 tokens
    • max: 5 tokens
  • Samples:
    anchor positive type
    Adapt and move set pieces during rehearsals and live performances. adapt sets skill_skill
    Prepare bread and bread products such as sandwiches for consumption. prepare bread products skill_skill
    The strategies, methods and techniques that increase the organisation's capacity to protect and sustain the services and operations that fulfil the organisational mission and create lasting values by effectively addressing the combined issues of security, preparedness, risk and disaster recovery. organisational resilience skill_skill
  • Loss: CachedMultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 64
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • overwrite_output_dir: True
  • eval_strategy: steps
  • per_device_train_batch_size: 4096
  • per_device_eval_batch_size: 4096
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: True
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4096
  • per_device_eval_batch_size: 4096
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step
0.1053 4
0.2105 8
0.3158 12
0.4211 16
0.5263 20
0.6316 24
0.7368 28
0.8421 32
0.9474 36
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.9.19
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.1+cu118
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
17
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jensjorisdecorte/ConTeXT-Skill-Extraction-base

Finetuned
(189)
this model