SentenceTransformer based on vinai/phobert-base-v2

This is a sentence-transformers model finetuned from vinai/phobert-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: vinai/phobert-base-v2
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("huudan123/stage1")
# Run inference
sentences = [
    'Báo đen đã editorialized chống lại những cuộc viếng_thăm của farrakhan với các nhà độc_tài châu phi .',
    'Báo đen đã viết về quá_khứ của farrakhan .',
    'Báo đen từ_chối yểm_trợ cho farrakhan .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.4203
spearman_cosine	0.5148
pearson_manhattan	0.5605
spearman_manhattan	0.5792
pearson_euclidean	0.471
spearman_euclidean	0.5087
pearson_dot	0.3924
spearman_dot	0.4338
pearson_max	0.5605
spearman_max	0.5792

Training Details

Training Dataset

Unnamed Dataset

Size: 102,178 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 27.28 tokens max: 147 tokens	min: 4 tokens mean: 14.99 tokens max: 44 tokens	min: 4 tokens mean: 14.34 tokens max: 34 tokens

Samples:

anchor	positive	negative
`Tem đầy màu_sắc của madeira , cũng như tiền xu , ghi_chép ngân_hàng , và các mặt_hàng khác như bưu_thiếp là mối quan_tâm đến nhiều nhà sưu_tập .`	`Các nhà sưu_tập sẽ thích ghé thăm madeira bởi_vì những phân_chia lớn của tem , ghi_chép ngân_hàng , bưu_thiếp , và nhiều mặt_hàng khác họ có_thể đọc được .`	`Mọi người quan_tâm đến việc bắt_đầu bộ sưu_tập mới nên thoát madeira và đi du_lịch phía bắc , nơi họ có khả_năng tìm thấy các cửa_hàng tốt .`
`Cẩn_thận đấy , ông inglethorp . Poirot bị bồn_chồn .`	`Hãy chăm_sóc ông inglethorp .`	`Không cần phải cẩn_thận với anh ta .`
`Phải có một_chút hoài_nghi về trải nghiệm cá_nhân của sperling với trò_chơi .`	`Hãy suy_nghĩ về những tác_động khi nhìn vào kinh_nghiệm của anh ấy .`	`Một người có_thể lấy trải nghiệm cá_nhân của sperling với giá_trị mặt .`

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Evaluation Dataset

Unnamed Dataset

Size: 12,772 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 27.81 tokens max: 164 tokens	min: 3 tokens mean: 14.94 tokens max: 42 tokens	min: 4 tokens mean: 14.4 tokens max: 39 tokens

Samples:

anchor	positive	negative
`Tình_yêu , anh có muốn em trở_thành kassandra lubbock của anh không ?`	`Tôi có_thể là kassandra lubbock của anh .`	`Tôi từ_chối trở_thành kassandra lubbock của anh .`
`Ví_dụ , trong mùa thu năm 1997 , ủy ban điều_trị hạt_nhân ( nrc ) văn_phòng thanh_tra tướng liệu nrc để có được quan_điểm của họ trên văn_hóa an_toàn của đại_lý .`	`Nhân_viên nrc đã được hỏi về quan_điểm của họ trên văn_hóa an_toàn của đại_lý .`	`Các nhân_viên không bao_giờ quan_sát về quan_điểm của họ về văn_hóa an_toàn của đại_lý trong mùa thu năm 1997 .`
`Mỗi năm kem của trẻ nghệ và comedic tài_năng làm cho nó đường đến edinburgh , và fringe đã lớn lên trong việc huấn_luyện lớn nhất trong khung_cảnh lớn nhất cho các diễn_viên phát_triển trên thế_giới .`	`Tài_năng mới đến edinburgh .`	`Tài_năng mới đến dublin .`

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
    "triplet_margin": 5
}

Training Hyperparameters

Non-Default Hyperparameters

overwrite_output_dir: True
eval_strategy: epoch
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
num_train_epochs: 20
lr_scheduler_type: cosine
warmup_ratio: 0.05
fp16: True
load_best_model_at_end: True
gradient_checkpointing: True

All Hyperparameters

Click to expand

overwrite_output_dir: True
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.05
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss	sts-dev_spearman_cosine
0	0	-	-	0.6643
0.0626	50	4.6946	-	-
0.1252	100	4.031	-	-
0.1877	150	2.7654	-	-
0.2503	200	2.4176	-	-
0.3129	250	2.1111	-	-
0.3755	300	2.0263	-	-
0.4380	350	1.9296	-	-
0.5006	400	1.7793	-	-
0.5632	450	1.7903	-	-
0.6258	500	1.7638	-	-
0.6884	550	1.7042	-	-
0.7509	600	1.7038	-	-
0.8135	650	1.6221	-	-
0.8761	700	1.6172	-	-
0.9387	750	1.6227	-	-
1.0	799	-	1.5275	0.5219
1.0013	800	1.6264	-	-
1.0638	850	1.364	-	-
1.1264	900	1.4447	-	-
1.1890	950	1.4161	-	-
1.2516	1000	1.3575	-	-
1.3141	1050	1.3554	-	-
1.3767	1100	1.378	-	-
1.4393	1150	1.3806	-	-
1.5019	1200	1.3089	-	-
1.5645	1250	1.4314	-	-
1.6270	1300	1.3672	-	-
1.6896	1350	1.3777	-	-
1.7522	1400	1.3282	-	-
1.8148	1450	1.3432	-	-
1.8773	1500	1.3101	-	-
1.9399	1550	1.2919	-	-
2.0	1598	-	1.3643	0.5667
2.0025	1600	1.2969	-	-
2.0651	1650	0.9629	-	-
2.1277	1700	0.9878	-	-
2.1902	1750	0.9437	-	-
2.2528	1800	0.9832	-	-
2.3154	1850	0.9584	-	-
2.3780	1900	1.0689	-	-
2.4406	1950	1.0579	-	-
2.5031	2000	0.9888	-	-
2.5657	2050	0.9452	-	-
2.6283	2100	0.9378	-	-
2.6909	2150	0.9553	-	-
2.7534	2200	0.9337	-	-
2.8160	2250	1.0184	-	-
2.8786	2300	0.9663	-	-
2.9412	2350	0.9686	-	-
3.0	2397	-	1.3488	0.5442
3.0038	2400	0.9618	-	-
3.0663	2450	0.6878	-	-
3.1289	2500	0.6883	-	-
3.1915	2550	0.6498	-	-
3.2541	2600	0.6651	-	-
3.3166	2650	0.6554	-	-
3.3792	2700	0.7033	-	-
3.4418	2750	0.6416	-	-
3.5044	2800	0.7068	-	-
3.5670	2850	0.6834	-	-
3.6295	2900	0.7099	-	-
3.6921	2950	0.7306	-	-
3.7547	3000	0.7105	-	-
3.8173	3050	0.7072	-	-
3.8798	3100	0.7248	-	-
3.9424	3150	0.7216	-	-
4.0	3196	-	1.3358	0.5307
4.0050	3200	0.693	-	-
4.0676	3250	0.4741	-	-
4.1302	3300	0.4593	-	-
4.1927	3350	0.449	-	-
4.2553	3400	0.4326	-	-
4.3179	3450	0.4488	-	-
4.3805	3500	0.4762	-	-
4.4431	3550	0.4723	-	-
4.5056	3600	0.4713	-	-
4.5682	3650	0.4612	-	-
4.6308	3700	0.4537	-	-
4.6934	3750	0.4928	-	-
4.7559	3800	0.4568	-	-
4.8185	3850	0.4771	-	-
4.8811	3900	0.4688	-	-
4.9437	3950	0.4549	-	-
5.0	3995	-	1.4027	0.5360
5.0063	4000	0.5048	-	-
5.0688	4050	0.2822	-	-
5.1314	4100	0.3069	-	-
5.1940	4150	0.2971	-	-
5.2566	4200	0.3191	-	-
5.3191	4250	0.3023	-	-
5.3817	4300	0.3224	-	-
5.4443	4350	0.3114	-	-
5.5069	4400	0.3098	-	-
5.5695	4450	0.3071	-	-
5.6320	4500	0.3478	-	-
5.6946	4550	0.3288	-	-
5.7572	4600	0.3373	-	-
5.8198	4650	0.3577	-	-
5.8824	4700	0.331	-	-
5.9449	4750	0.3132	-	-
6.0	4794	-	1.4036	0.5148

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.42.4
PyTorch: 2.3.1+cu121
Accelerate: 0.32.1
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

huudan123
/

stage1

SentenceTransformer based on vinai/phobert-base-v2

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

TripletLoss

Model tree for huudan123/stage1

Evaluation results