bkcare-embed-text-v1.0 / README.md

nampham1106

Update README.md

93d778d verified 9 days ago

preview code

raw

history blame contribute delete

No virus

10.2 kB

	---
	language:
	- vi
	library_name: sentence-transformers
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:388774
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: BookingCare/bkcare-bert-pretrained
	datasets:
	- facebook/xnli
	metrics:
	- pearson_cosine
	- spearman_cosine
	- pearson_manhattan
	- spearman_manhattan
	- pearson_euclidean
	- spearman_euclidean
	- pearson_dot
	- spearman_dot
	- pearson_max
	- spearman_max
	widget:
	- source_sentence: Như bằng chứng về việc này , cô ta đã chi tiết các tài sản bầu
	cử của clinton theo tiểu bang , ở phía đông bắc , Trung Tây , và tây .
	sentences:
	- Bộ chọn ứng cử viên không vui chơi ở các bữa tiệc .
	- Sử dụng công nghệ thông tin cho phép sử dụng các nguồn tài nguyên liên lạc lớn
	hơn .
	- Không bao giờ có một tài khoản kỹ lưỡng của các cuộc bầu cử của clinton .
	- source_sentence: Sau một thời gian , ông ấy ngừng bò và ngồi lên .
	sentences:
	- Jon muốn có một trận đấu lớn để bắt đầu .
	- Tất cả mọi người đều được đưa ra một tách trung quốc vào đầu năm .
	- Anh ta bị thương nghiêm trọng .
	- source_sentence: Arras đã nổi tiếng trong thời trung cổ cho tác phẩm của vải và
	những tấm thảm treo cổ , loại thông qua mà polonius gặp phải cái chết của ông
	ta ở hamlet .
	sentences:
	- Lũ lụt đang dự kiến đã gây ra 1.5 tỷ đô la trong thiệt hại .
	- Nó sẽ là bắt buộc cho những người nghèo khổ vì những quy định .
	- Arras chỉ làm đồ gốm thôi .
	- source_sentence: Lehrer là người về sự giao tiếp này với gió và quyền lực , và nó
	đã biến anh ta thành một trong số họ .
	sentences:
	- Người đã làm julius cảm thấy lo lắng .
	- Họ có thể mất 36 tháng để hoàn thành .
	- Leher không thích giao tiếp với các chính trị gia .
	- source_sentence: Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .
	sentences:
	- Tôi sẽ ban cho anh những lời chúc của anh , julius bỏ súng xuống .
	- Bạn có thể được đề nghị giả ngọc , điều đó rất tương tự với các đối tác cao hơn
	của nó .
	- Nó đến trong túi 400 pound .
	pipeline_tag: sentence-similarity
	model-index:
	- name: SentenceTransformer based on BookingCare/bkcare-bert-pretrained
	results:
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts dev 768
	type: sts-dev-768
	metrics:
	- type: pearson_cosine
	value: 0.6867482534374487
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.6700553964995389
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.6734129943367082
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.6689701652447698
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.6743893025028618
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.6700560677966448
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.6867482521687218
	name: Pearson Dot
	- type: spearman_dot
	value: 0.6700558146434896
	name: Spearman Dot
	- type: pearson_max
	value: 0.6867482534374487
	name: Pearson Max
	- type: spearman_max
	value: 0.6700560677966448
	name: Spearman Max
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts dev 512
	type: sts-dev-512
	metrics:
	- type: pearson_cosine
	value: 0.6850905517919458
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.6685671393301956
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.6726989775543833
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.6682515030981849
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.6739395873419184
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.6695224924884773
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.6802500913119895
	name: Pearson Dot
	- type: spearman_dot
	value: 0.6631065723741826
	name: Spearman Dot
	- type: pearson_max
	value: 0.6850905517919458
	name: Pearson Max
	- type: spearman_max
	value: 0.6695224924884773
	name: Spearman Max
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts dev 256
	type: sts-dev-256
	metrics:
	- type: pearson_cosine
	value: 0.6725154983351178
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.6575647130100782
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.6697743652714089
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.6645201863227755
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.6719730940115203
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.6669909427123673
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.6475732494643994
	name: Pearson Dot
	- type: spearman_dot
	value: 0.6294359395183124
	name: Spearman Dot
	- type: pearson_max
	value: 0.6725154983351178
	name: Pearson Max
	- type: spearman_max
	value: 0.6669909427123673
	name: Spearman Max
	---

	# SentenceTransformer based on BookingCare/bkcare-bert-pretrained

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BookingCare/bkcare-bert-pretrained](https://huggingface.co/BookingCare/bkcare-bert-pretrained) on the [facebook/xnli](https://huggingface.co/datasets/facebook/xnli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [BookingCare/bkcare-bert-pretrained](https://huggingface.co/BookingCare/bkcare-bert-pretrained) <!-- at revision f869851286af65b3dbe0541a14fc5d3d2bb6c95d -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 768 tokens
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- [facebook/xnli](https://huggingface.co/datasets/facebook/xnli)
	- Languages:vi
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("nampham1106/bkcare-text-emb-v1.0")
	# Run inference
	sentences = [
	'Tôi sẽ làm tất cả những gì ông muốn. julius hạ khẩu súng lục .',
	'Tôi sẽ ban cho anh những lời chúc của anh , julius bỏ súng xuống .',
	'Nó đến trong túi 400 pound .',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Semantic Similarity
	* Dataset: `sts-dev-768`
	* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| pearson_cosine \| 0.6867 \|
	\| spearman_cosine \| 0.6701 \|
	\| pearson_manhattan \| 0.6734 \|
	\| spearman_manhattan \| 0.669 \|
	\| pearson_euclidean \| 0.6744 \|
	\| spearman_euclidean \| 0.6701 \|
	\| pearson_dot \| 0.6867 \|
	\| spearman_dot \| 0.6701 \|
	\| pearson_max \| 0.6867 \|
	\| spearman_max \| 0.6701 \|

	#### Semantic Similarity
	* Dataset: `sts-dev-512`
	* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| pearson_cosine \| 0.6851 \|
	\| spearman_cosine \| 0.6686 \|
	\| pearson_manhattan \| 0.6727 \|
	\| spearman_manhattan \| 0.6683 \|
	\| pearson_euclidean \| 0.6739 \|
	\| spearman_euclidean \| 0.6695 \|
	\| pearson_dot \| 0.6803 \|
	\| spearman_dot \| 0.6631 \|
	\| pearson_max \| 0.6851 \|
	\| spearman_max \| 0.6695 \|