SentenceTransformer based on x2bee/ModernBert_MLM_kotoken_v03

This is a sentence-transformers model finetuned from x2bee/ModernBert_MLM_kotoken_v03 on the misc_sts_pairs_v2_kor dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("x2bee/KoModernBERT-base-nli-sts-SBERT_v01")
# Run inference
sentences = [
    '수동 운전석 창문을 어떻게 수리하나요?',
    '1992년형 혼다 시빅에서 올라가지 않는 수동 창문을 어떻게 수리하나요?',
    '아홉 번째 닥터가 멈춘 닥터 후 에피소드는 무엇입니까?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.524
spearman_cosine 0.5139
pearson_euclidean 0.5051
spearman_euclidean 0.5001
pearson_manhattan 0.5087
spearman_manhattan 0.504
pearson_dot 0.4545
spearman_dot 0.4439
pearson_max 0.524
spearman_max 0.5139

Training Details

Training Dataset

misc_sts_pairs_v2_kor

  • Dataset: misc_sts_pairs_v2_kor at 845f810
  • Size: 449,904 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 17.81 tokens
    • max: 49 tokens
    • min: 6 tokens
    • mean: 17.78 tokens
    • max: 80 tokens
    • min: 0.53
    • mean: 0.75
    • max: 0.98
  • Samples:
    sentence1 sentence2 score
    1999년형 유콘 4륜구동 차량의 앞쪽 조수석 타이어에서 발생하는 갈리는 소음의 원인은 무엇인가요? 차의 오른쪽 앞쪽에서 발생하는 갈리는 소리의 원인은 무엇인가요? 0.8193586337477191
    왜 제임스타운 정착민들은 그곳의 원주민들과 갈등을 겪었는가? 왜 제임스타운은 원주민들과 갈등을 겪었는가? 0.8701910827908218
    옥수수 전분을 섭취하는 것이 건강에 어떤 영향을 미칠 수 있습니까? 옥수수 전분을 섭취하면 당신에게 어떤 영향을 미칠까요? 0.8809354609563622
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

misc_sts_pairs_v2_kor

  • Dataset: misc_sts_pairs_v2_kor at 845f810
  • Size: 449,904 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 7 tokens
    • mean: 17.76 tokens
    • max: 65 tokens
    • min: 6 tokens
    • mean: 17.65 tokens
    • max: 52 tokens
    • min: 0.53
    • mean: 0.75
    • max: 0.98
  • Samples:
    sentence1 sentence2 score
    용광로의 온도는 얼마나 뜨거운가? 용광로의 온도는 얼마나 높습니까? 0.751853250408994
    영어로 'Lei è il mio uno e solo'는 어떻게 철자하나요? 'Lei è il mio uno e solo'의 영어 동등어는 무엇인가요? 0.8265661603331053
    버드와이저 포커 광고에 나오는 소녀는 누구인가요? 포커 스타일의 버드와이저 광고에 나오는 소녀는 누구인가요? 0.9301912848973812
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 4
  • learning_rate: 1e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.3
  • push_to_hub: True
  • hub_model_id: x2bee/KoModernBERT-base-nli-sts-SBERT_v01
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: x2bee/KoModernBERT-base-nli-sts-SBERT_v01
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss sts_dev_spearman_max
0 0 - - 0.5070
0.2397 100 0.0311 - -
0.4793 200 0.0082 - -
0.7190 300 0.0065 - -
0.9587 400 0.0061 - -
1.0 418 - 0.0059 0.4899
1.1965 500 0.0058 - -
1.4362 600 0.0057 - -
1.6759 700 0.0055 - -
1.9155 800 0.0053 - -
1.9970 834 - 0.0057 0.5139

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
4
Safetensors
Model size
186M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train x2bee/KoModernBERT-base-nli-sts-SBERT_v01

Evaluation results