nampham1106
/

bkcare-embed-text-v1.0

@@ -1,20 +1,6 @@
 ---
 language:
-- ar
-- bg
-- de
-- el
-- en
-- es
-- fr
-- hi
-- ru
-- sw
-- th
-- tr
-- ur
 - vi
-- zh
 library_name: sentence-transformers
 tags:
 - sentence-transformers
@@ -201,7 +187,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [B
 - **Similarity Function:** Cosine Similarity
 - **Training Dataset:**
     - [facebook/xnli](https://huggingface.co/datasets/facebook/xnli)
-- **Languages:** ar, bg, de, el, en, es, fr, hi, ru, sw, th, tr, ur, vi, zh
 <!-- - **License:** Unknown -->
 ### Model Sources
@@ -235,7 +221,7 @@ Then you can load this model and run inference.
 from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
-model = SentenceTransformer("matryoshka_nli_BookingCare-bkcare-bert-pretrained-2024-07-19_04-21-48")
 # Run inference
 sentences = [
     'Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .',
@@ -313,334 +299,3 @@ You can finetune this model on your own dataset.
 | spearman_dot        | 0.6631     |
 | pearson_max         | 0.6851     |
 | spearman_max        | 0.6695     |
-#### Semantic Similarity
-* Dataset: `sts-dev-256`
-* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
-| Metric              | Value      |
-|:--------------------|:-----------|
-| pearson_cosine      | 0.6725     |
-| **spearman_cosine** | **0.6576** |
-| pearson_manhattan   | 0.6698     |
-| spearman_manhattan  | 0.6645     |
-| pearson_euclidean   | 0.672      |
-| spearman_euclidean  | 0.667      |
-| pearson_dot         | 0.6476     |
-| spearman_dot        | 0.6294     |
-| pearson_max         | 0.6725     |
-| spearman_max        | 0.667      |
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
-## Training Details
-### Training Dataset
-#### facebook/xnli
-* Dataset: [facebook/xnli](https://huggingface.co/datasets/facebook/xnli) at [b8dd5d7](https://huggingface.co/datasets/facebook/xnli/tree/b8dd5d7af51114dbda02c0e3f6133f332186418e)
-* Size: 388,774 training samples
-* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | premise                                                                            | hypothesis                                                                        | label                                                              |
-  |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                            | int                                                                |
-  | details | <ul><li>min: 3 tokens</li><li>mean: 29.98 tokens</li><li>max: 309 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 15.64 tokens</li><li>max: 61 tokens</li></ul> | <ul><li>0: ~33.10%</li><li>1: ~35.60%</li><li>2: ~31.30%</li></ul> |
-* Samples:
-  | premise                                                                                                                                                                                   | hypothesis                                                                                                                                | label          |
-  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
-  | <code>Những rắc rối với loại phân tích chi tiết này có nghĩa là bất kỳ nghệ nhân nào có thể nghiên cứu kỹ thuật của người nghệ thuật và tái tạo chúng -- sự chuẩn bị của hoffman .</code> | <code>Sự tái tạo là một quá trình dễ dàng .</code>                                                                                        | <code>2</code> |
-  | <code>Đó là một sự quan sát tỉnh rượu , để nhận ra rằng 80 phần trăm của những người cần sự giúp đỡ pháp lý bị từ chối những hướng dẫn và luật sự .</code>                                | <code>80 % những người c��n sự trợ giúp pháp lý bị từ chối những hướng dẫn mà họ đang tìm kiếm , và đây là một suy nghĩ tỉnh rượu .</code> | <code>0</code> |
-  | <code>Đi qua cái để tìm nhà thờ của những hình xăm egios .</code>                                                                                                                         | <code>Nếu anh đi qua cái , anh sẽ tìm thấy mình ở bờ vực của thị trấn , không có gì ngoài nông thôn bên kia .</code>                      | <code>2</code> |
-* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
-  ```json
-  {
-      "loss": "MultipleNegativesRankingLoss",
-      "matryoshka_dims": [
-          768,
-          512,
-          256
-      ],
-      "matryoshka_weights": [
-          1,
-          1,
-          1
-      ],
-      "n_dims_per_step": -1
-  }
-  ```
-### Evaluation Dataset
-#### facebook/xnli
-* Dataset: [facebook/xnli](https://huggingface.co/datasets/facebook/xnli) at [b8dd5d7](https://huggingface.co/datasets/facebook/xnli/tree/b8dd5d7af51114dbda02c0e3f6133f332186418e)
-* Size: 3,928 evaluation samples
-* Columns: <code>premise</code>, <code>hypothesis</code>, and <code>label</code>
-* Approximate statistics based on the first 1000 samples:
-  |         | premise                                                                           | hypothesis                                                                        | label                                                              |
-  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------|
-  | type    | string                                                                            | string                                                                            | int                                                                |
-  | details | <ul><li>min: 4 tokens</li><li>mean: 32.3 tokens</li><li>max: 163 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 15.73 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>0: ~32.40%</li><li>1: ~33.50%</li><li>2: ~34.10%</li></ul> |
-* Samples:
-  | premise                                                                                                                    | hypothesis                                                                             | label          |
-  |:---------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------|
-  | <code>Hai xu mắt anh ta warily .</code>                                                                                    | <code>Hai xu không nhìn anh ta .</code>                                                | <code>2</code> |
-  | <code>Một không khí chung của glee permeated tất cả mọi người .</code>                                                     | <code>Mọi thứ đều cảm thấy hạnh phúc .</code>                                          | <code>0</code> |
-  | <code>Tuy nhiên , một sự chắc chắn là dân số hoa kỳ đã bị lão hóa và sẽ có ít công nhân hỗ trợ mỗi người nghỉ hưu .</code> | <code>Trạng Thái lão hóa của dân số hoa kỳ được coi là một sự không chắc chắn .</code> | <code>2</code> |
-* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
-  ```json
-  {
-      "loss": "MultipleNegativesRankingLoss",
-      "matryoshka_dims": [
-          768,
-          512,
-          256
-      ],
-      "matryoshka_weights": [
-          1,
-          1,
-          1
-      ],
-      "n_dims_per_step": -1
-  }
-  ```
-### Training Hyperparameters
-#### Non-Default Hyperparameters
-- `eval_strategy`: steps
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
-- `learning_rate`: 2e-05
-- `num_train_epochs`: 1
-- `warmup_ratio`: 0.1
-- `fp16`: True
-- `batch_sampler`: no_duplicates
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `overwrite_output_dir`: False
-- `do_predict`: False
-- `eval_strategy`: steps
-- `prediction_loss_only`: True
-- `per_device_train_batch_size`: 32
-- `per_device_eval_batch_size`: 32
-- `per_gpu_train_batch_size`: None
-- `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 1
-- `eval_accumulation_steps`: None
-- `learning_rate`: 2e-05
-- `weight_decay`: 0.0
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 1
-- `max_steps`: -1
-- `lr_scheduler_type`: linear
-- `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.1
-- `warmup_steps`: 0
-- `log_level`: passive
-- `log_level_replica`: warning
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `save_safetensors`: True
-- `save_on_each_node`: False
-- `save_only_model`: False
-- `restore_callback_states_from_checkpoint`: False
-- `no_cuda`: False
-- `use_cpu`: False
-- `use_mps_device`: False
-- `seed`: 42
-- `data_seed`: None
-- `jit_mode_eval`: False
-- `use_ipex`: False
-- `bf16`: False
-- `fp16`: True
-- `fp16_opt_level`: O1
-- `half_precision_backend`: auto
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: None
-- `local_rank`: 0
-- `ddp_backend`: None
-- `tpu_num_cores`: None
-- `tpu_metrics_debug`: False
-- `debug`: []
-- `dataloader_drop_last`: False
-- `dataloader_num_workers`: 0
-- `dataloader_prefetch_factor`: None
-- `past_index`: -1
-- `disable_tqdm`: False
-- `remove_unused_columns`: True
-- `label_names`: None
-- `load_best_model_at_end`: False
-- `ignore_data_skip`: False
-- `fsdp`: []
-- `fsdp_min_num_params`: 0
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `fsdp_transformer_layer_cls_to_wrap`: None
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `deepspeed`: None
-- `label_smoothing_factor`: 0.0
-- `optim`: adamw_torch
-- `optim_args`: None
-- `adafactor`: False
-- `group_by_length`: False
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `skip_memory_metrics`: True
-- `use_legacy_prediction_loop`: False
-- `push_to_hub`: False
-- `resume_from_checkpoint`: None
-- `hub_model_id`: None
-- `hub_strategy`: every_save
-- `hub_private_repo`: False
-- `hub_always_push`: False
-- `gradient_checkpointing`: False
-- `gradient_checkpointing_kwargs`: None
-- `include_inputs_for_metrics`: False
-- `eval_do_concat_batches`: True
-- `fp16_backend`: auto
-- `push_to_hub_model_id`: None
-- `push_to_hub_organization`: None
-- `mp_parameters`:
-- `auto_find_batch_size`: False
-- `full_determinism`: False
-- `torchdynamo`: None
-- `ray_scope`: last
-- `ddp_timeout`: 1800
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `dispatch_batches`: None
-- `split_batches`: None
-- `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
-- `neftune_noise_alpha`: None
-- `optim_target_modules`: None
-- `batch_eval_metrics`: False
-- `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: proportional
-</details>
-### Training Logs
-| Epoch  | Step | Training Loss | loss   | sts-dev-256_spearman_cosine | sts-dev-512_spearman_cosine | sts-dev-768_spearman_cosine |
-|:------:|:----:|:-------------:|:------:|:---------------------------:|:---------------------------:|:---------------------------:|
-| 0      | 0    | -             | -      | 0.5425                      | 0.5569                      | 0.5593                      |
-| 0.0494 | 300  | 5.6741        | -      | -                           | -                           | -                           |
-| 0.0823 | 500  | -             | 2.9876 | 0.6417                      | 0.6479                      | 0.6502                      |
-| 0.0988 | 600  | 3.5541        | -      | -                           | -                           | -                           |
-| 0.1481 | 900  | 2.9032        | -      | -                           | -                           | -                           |
-| 0.1646 | 1000 | -             | 2.3400 | 0.6526                      | 0.6565                      | 0.6591                      |
-| 0.1975 | 1200 | 2.6495        | -      | -                           | -                           | -                           |
-| 0.2469 | 1500 | 2.426         | 2.1092 | 0.6359                      | 0.6466                      | 0.6501                      |
-| 0.2963 | 1800 | 2.2969        | -      | -                           | -                           | -                           |
-| 0.3292 | 2000 | -             | 1.9556 | 0.6390                      | 0.6491                      | 0.6516                      |
-| 0.3457 | 2100 | 2.1003        | -      | -                           | -                           | -                           |
-| 0.3951 | 2400 | 2.0975        | -      | -                           | -                           | -                           |
-| 0.4115 | 2500 | -             | 1.8133 | 0.6585                      | 0.6681                      | 0.6709                      |
-| 0.4444 | 2700 | 2.0403        | -      | -                           | -                           | -                           |
-| 0.4938 | 3000 | 1.9421        | 1.7629 | 0.6415                      | 0.6515                      | 0.6540                      |
-| 0.5432 | 3300 | 1.9313        | -      | -                           | -                           | -                           |
-| 0.5761 | 3500 | -             | 1.6924 | 0.6577                      | 0.6660                      | 0.6673                      |
-| 0.5926 | 3600 | 1.8582        | -      | -                           | -                           | -                           |
-| 0.6420 | 3900 | 1.8203        | -      | -                           | -                           | -                           |
-| 0.6584 | 4000 | -             | 1.6263 | 0.6527                      | 0.6620                      | 0.6635                      |
-| 0.6914 | 4200 | 1.8281        | -      | -                           | -                           | -                           |
-| 0.7407 | 4500 | 1.8037        | 1.5776 | 0.6572                      | 0.6677                      | 0.6685                      |
-| 0.7901 | 4800 | 1.7771        | -      | -                           | -                           | -                           |
-| 0.8230 | 5000 | -             | 1.5571 | 0.6548                      | 0.6652                      | 0.6665                      |
-| 0.8395 | 5100 | 1.7427        | -      | -                           | -                           | -                           |
-| 0.8889 | 5400 | 1.6901        | -      | -                           | -                           | -                           |
-| 0.9053 | 5500 | -             | 1.5385 | 0.6604                      | 0.6707                      | 0.6717                      |
-| 0.9383 | 5700 | 1.7977        | -      | -                           | -                           | -                           |
-| 0.9877 | 6000 | 1.6838        | 1.5279 | 0.6576                      | 0.6686                      | 0.6701                      |
-### Framework Versions
-- Python: 3.10.13
-- Sentence Transformers: 3.0.1
-- Transformers: 4.41.2
-- PyTorch: 2.1.2
-- Accelerate: 0.30.1
-- Datasets: 2.19.2
-- Tokenizers: 0.19.1
-## Citation
-### BibTeX
-#### Sentence Transformers
-```bibtex
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "https://arxiv.org/abs/1908.10084",
-}
-```
-#### MatryoshkaLoss
-```bibtex
-@misc{kusupati2024matryoshka,
-    title={Matryoshka Representation Learning},
-    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
-    year={2024},
-    eprint={2205.13147},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
-#### MultipleNegativesRankingLoss
-```bibtex
-@misc{henderson2017efficient,
-    title={Efficient Natural Language Response Suggestion for Smart Reply},
-    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
-    year={2017},
-    eprint={1705.00652},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 ---
 language:
 - vi
 library_name: sentence-transformers
 tags:
 - sentence-transformers
 - **Similarity Function:** Cosine Similarity
 - **Training Dataset:**
     - [facebook/xnli](https://huggingface.co/datasets/facebook/xnli)
+- **Languages:**vi
 <!-- - **License:** Unknown -->
 ### Model Sources
 from sentence_transformers import SentenceTransformer
 # Download from the 🤗 Hub
+model = SentenceTransformer("nampham1106/bkcare-text-emb-v1.0")
 # Run inference
 sentences = [
     'Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .',
 | spearman_dot        | 0.6631     |
 | pearson_max         | 0.6851     |
 | spearman_max        | 0.6695     |