SentenceTransformer based on huudan123/stage1

This is a sentence-transformers model finetuned from huudan123/stage1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: huudan123/stage1
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("huudan123/stage2")
# Run inference
sentences = [
    'bạn tiếp_tục nhập thông_tin cơ_sở dữ_liệu',
    'bạn mọi thứ bạn bắt_đầu_từ',
    'bạn tiếp_tục bạn nhập mọi thứ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7133
spearman_cosine	0.714
pearson_manhattan	0.6924
spearman_manhattan	0.6987
pearson_euclidean	0.6928
spearman_euclidean	0.6988
pearson_dot	0.6562
spearman_dot	0.6553
pearson_max	0.7133
spearman_max	0.714

Training Details

Training Dataset

Unnamed Dataset

Size: 254,546 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 3 tokens mean: 14.78 tokens max: 110 tokens	min: 3 tokens mean: 14.78 tokens max: 110 tokens	min: 3 tokens mean: 10.19 tokens max: 29 tokens

Samples:

anchor	positive	negative
`conceptualy kem skiming hai kích_thước cơ_bản sản_phẩm địa_lý`	`sản_phẩm địa_lý làm kem skiming làm_việc`	`kem skiming hai tập_trung sản_phẩm địa_lý`
`sản_phẩm địa_lý làm kem skiming làm_việc`	`conceptualy kem skiming hai kích_thước cơ_bản sản_phẩm địa_lý`	`kem skiming hai tập_trung sản_phẩm địa_lý`
`bạn biết trong mùa giải tôi đoán ở mức_độ bạn bạn mất chúng đến mức_độ tiếp_theo họ quyết_định nhớ đội_ngũ cha_mẹ chiến_binh quyết_định gọi nhớ một người ba a một người đàn_ông đi đến thay_thế anh ta một người đàn_ông nào đi thay_thế anh ta`	`recals thực_hiện thứ sáu`	`anh mất mọi thứ ở mức_độ người dân nhớ`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 1,660 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 13.54 tokens max: 51 tokens	min: 4 tokens mean: 13.54 tokens max: 51 tokens	min: 3 tokens mean: 8.78 tokens max: 22 tokens

Samples:

anchor	positive	negative
`anh ấy nói mẹ con về nhà`	`xuống xe_buýt trường anh ấy gọi mẹ`	`anh nói mẹ anh về nhà`
`xuống xe_buýt trường anh ấy gọi mẹ`	`anh ấy nói mẹ con về nhà`	`anh nói mẹ anh về nhà`
`tôi biết mình hướng tới mục_đích báo_cáo một địa_chỉ ở washington`	`tôi bao_giờ đến washington tôi chỉ_định ở tôi lạc cố_gắng tìm`	`tôi hoàn_toàn chắc_chắn tôi làm tôi đi đến washington tôi giao báo_cáo`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

overwrite_output_dir: True
eval_strategy: epoch
per_device_train_batch_size: 256
per_device_eval_batch_size: 256
num_train_epochs: 20
lr_scheduler_type: cosine
warmup_ratio: 0.05
fp16: True
load_best_model_at_end: True
gradient_checkpointing: True

All Hyperparameters

Click to expand

overwrite_output_dir: True
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 256
per_device_eval_batch_size: 256
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.05
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	loss	sts-dev_spearman_cosine
0	0	-	-	0.5307
0.0503	50	9.1742	-	-
0.1005	100	5.9716	-	-
0.1508	150	4.6737	-	-
0.2010	200	3.2819	-	-
0.2513	250	2.8832	-	-
0.3015	300	2.7327	-	-
0.3518	350	2.6305	-	-
0.4020	400	2.6239	-	-
0.4523	450	2.5527	-	-
0.5025	500	2.5271	-	-
0.5528	550	2.4904	-	-
0.6030	600	2.4987	-	-
0.6533	650	2.4009	-	-
0.7035	700	2.3944	-	-
0.7538	750	2.5054	-	-
0.8040	800	2.3989	-	-
0.8543	850	2.4019	-	-
0.9045	900	2.3638	-	-
0.9548	950	2.3478	-	-
1.0	995	-	3.0169	0.7322
1.0050	1000	2.4424	-	-
1.0553	1050	2.2478	-	-
1.1055	1100	2.2448	-	-
1.1558	1150	2.205	-	-
1.2060	1200	2.1811	-	-
1.2563	1250	2.1794	-	-
1.3065	1300	2.1495	-	-
1.3568	1350	2.1548	-	-
1.4070	1400	2.1299	-	-
1.4573	1450	2.1335	-	-
1.5075	1500	2.1388	-	-
1.5578	1550	2.0999	-	-
1.6080	1600	2.0859	-	-
1.6583	1650	2.0959	-	-
1.7085	1700	2.0334	-	-
1.7588	1750	2.0647	-	-
1.8090	1800	2.0261	-	-
1.8593	1850	2.0133	-	-
1.9095	1900	2.0517	-	-
1.9598	1950	2.0152	-	-
2.0	1990	-	3.1210	0.7187
2.0101	2000	1.924	-	-
2.0603	2050	1.7472	-	-
2.1106	2100	1.7485	-	-
2.1608	2150	1.7536	-	-
2.2111	2200	1.751	-	-
2.2613	2250	1.7172	-	-
2.3116	2300	1.7269	-	-
2.3618	2350	1.7352	-	-
2.4121	2400	1.7019	-	-
2.4623	2450	1.7278	-	-
2.5126	2500	1.7046	-	-
2.5628	2550	1.6962	-	-
2.6131	2600	1.6881	-	-
2.6633	2650	1.6806	-	-
2.7136	2700	1.6614	-	-
2.7638	2750	1.6918	-	-
2.8141	2800	1.6794	-	-
2.8643	2850	1.6708	-	-
2.9146	2900	1.6531	-	-
2.9648	2950	1.6236	-	-
3.0	2985	-	3.2556	0.7140

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.42.4
PyTorch: 2.3.1+cu121
Accelerate: 0.32.1
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

huudan123
/

stage2

SentenceTransformer based on huudan123/stage1

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Model tree for huudan123/stage2

Evaluation results