SentenceTransformer based on vinai/phobert-base-v2

This is a sentence-transformers model finetuned from vinai/phobert-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: vinai/phobert-base-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("huudan123/stag_123_cp10000")
# Run inference
sentences = [
    'Câu trả lời đơn giản là có, chồi hoa trên rau diếp là một dấu hiệu chắc chắn của việc bắt vít.',
    'Có vẻ như nó đã bắt đầu bắt đầu.',
    'Hai người đàn ông đang đợi một chuyến đi bên lề đường đất.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.5692
spearman_cosine 0.5882
pearson_manhattan 0.7068
spearman_manhattan 0.7122
pearson_euclidean 0.4671
spearman_euclidean 0.5309
pearson_dot 0.3262
spearman_dot 0.4802
pearson_max 0.7068
spearman_max 0.7122

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • overwrite_output_dir: True
  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • gradient_accumulation_steps: 2
  • learning_rate: 1e-05
  • num_train_epochs: 15
  • lr_scheduler_type: cosine_with_restarts
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • gradient_checkpointing: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: True
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss stage1 loss stage2 loss stage3 loss sts-evaluator_spearman_max
0 0 - - - - 0.6643
0.0877 100 4.3054 - - - -
0.1754 200 3.93 - - - -
0.2632 300 3.585 - - - -
0.3509 400 3.4482 - - - -
0.4386 500 3.1858 4.3297 2.6006 0.1494 0.7527
0.5263 600 3.141 - - - -
0.6140 700 2.9477 - - - -
0.7018 800 2.6271 - - - -
0.7895 900 2.6175 - - - -
0.8772 1000 2.4931 2.9001 2.3487 0.1593 0.6907
0.9649 1100 2.4516 - - - -
1.0526 1200 2.4662 - - - -
1.1404 1300 2.5022 - - - -
1.2281 1400 2.4325 - - - -
1.3158 1500 2.4058 2.7163 2.1658 0.1392 0.7121
1.4035 1600 2.3305 - - - -
1.4912 1700 2.2677 - - - -
1.5789 1800 2.2555 - - - -
1.6667 1900 2.2275 - - - -
1.7544 2000 2.1846 2.5441 2.1172 0.1293 0.6781
1.8421 2100 2.2007 - - - -
1.9298 2200 2.192 - - - -
2.0175 2300 2.1491 - - - -
2.1053 2400 2.2419 - - - -
2.1930 2500 2.1822 2.4765 2.0476 0.1055 0.6893
2.2807 2600 2.1384 - - - -
2.3684 2700 2.1379 - - - -
2.4561 2800 2.0558 - - - -
2.5439 2900 2.057 - - - -
2.6316 3000 2.0263 2.4108 2.0751 0.0904 0.7016
2.7193 3100 1.9587 - - - -
2.8070 3200 2.0702 - - - -
2.8947 3300 2.0058 - - - -
2.9825 3400 2.0093 - - - -
3.0702 3500 2.0347 2.3948 1.9958 0.0937 0.7131
3.1579 3600 2.0071 - - - -
3.2456 3700 1.9708 - - - -
3.3333 3800 2.027 - - - -
3.4211 3900 1.9432 - - - -
3.5088 4000 1.9245 2.3858 2.0274 0.0831 0.7197
3.5965 4100 1.8814 - - - -
3.6842 4200 1.8619 - - - -
3.7719 4300 1.8987 - - - -
3.8596 4400 1.8764 - - - -
3.9474 4500 1.8908 2.3753 2.0066 0.0872 0.7052
4.0351 4600 1.8737 - - - -
4.1228 4700 1.9289 - - - -
4.2105 4800 1.8755 - - - -
4.2982 4900 1.8542 - - - -
4.3860 5000 1.8514 2.3731 2.0023 0.0824 0.7191
4.4737 5100 1.7939 - - - -
4.5614 5200 1.8126 - - - -
4.6491 5300 1.7662 - - - -
4.7368 5400 1.7448 - - - -
4.8246 5500 1.7736 2.3703 2.0038 0.0768 0.7044
4.9123 5600 1.7993 - - - -
5.0 5700 1.7811 - - - -
5.0877 5800 1.7905 - - - -
5.1754 5900 1.7539 - - - -
5.2632 6000 1.7393 2.3568 2.0173 0.0853 0.7263
5.3509 6100 1.7882 - - - -
5.4386 6200 1.682 - - - -
5.5263 6300 1.7175 - - - -
5.6140 6400 1.6806 - - - -
5.7018 6500 1.6243 2.3715 2.0202 0.0770 0.7085
5.7895 6600 1.7079 - - - -
5.8772 6700 1.6743 - - - -
5.9649 6800 1.6897 - - - -
6.0526 6900 1.668 - - - -
6.1404 7000 1.6806 2.3826 1.9925 0.0943 0.7072
6.2281 7100 1.6394 - - - -
6.3158 7200 1.6738 - - - -
6.4035 7300 1.6382 - - - -
6.4912 7400 1.6109 - - - -
6.5789 7500 1.5864 2.3849 2.0064 0.0831 0.7200
6.6667 7600 1.5838 - - - -
6.7544 7700 1.5776 - - - -
6.8421 7800 1.5904 - - - -
6.9298 7900 1.6198 - - - -
7.0175 8000 1.5661 2.3917 2.0038 0.0746 0.7131
7.1053 8100 1.6253 - - - -
7.1930 8200 1.5564 - - - -
7.2807 8300 1.5947 - - - -
7.3684 8400 1.5982 - - - -
7.4561 8500 1.53 2.3761 2.0162 0.0775 0.7189
7.5439 8600 1.5412 - - - -
7.6316 8700 1.5287 - - - -
7.7193 8800 1.4652 - - - -
7.8070 8900 1.5611 - - - -
7.8947 9000 1.5258 2.3870 1.9896 0.0828 0.7126
7.9825 9100 1.552 - - - -
8.0702 9200 1.5287 - - - -
8.1579 9300 1.4889 - - - -
8.2456 9400 1.4893 - - - -
8.3333 9500 1.5538 2.3810 1.9956 0.0772 0.7181
8.4211 9600 1.4863 - - - -
8.5088 9700 1.4894 - - - -
8.5965 9800 1.4516 - - - -
8.6842 9900 1.4399 - - - -
8.7719 10000 1.4699 2.3991 1.9760 0.0894 0.7122

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
15
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for huudan123/stag_123_cp10000

Finetuned
(188)
this model

Evaluation results