hkunlp
/

instructor-base

Model card Files Files and versions Community

instructor-base / README.md

multi-train's picture

Upload 10 files

7edca84 over 1 year ago

|

No virus

2.06 kB

	---
	pipeline_tag: sentence-similarity
	language: en
	license: apache-2.0
	tags:
	- sentence-transformers
	- feature-extraction
	- sentence-similarity
	- transformers
	---

	# hkunlp/instructor-base
	This is a general embedding model: It maps any piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time without further training. With instructions, the embeddings are domain-specific (e.g., specialized for science, finance, etc.) and task-aware (e.g., customized for classification, information retrieval, etc.)

	The model is easy to use with `sentence-transformer` library.

	## Installation
	```bash
	git clone https://github.com/HKUNLP/instructor-embedding
	cd sentence-transformers
	pip install -e .
	```

	## Compute your customized embeddings
	Then you can use the model like this to calculate domain-specific and task-aware embeddings:
	```python
	from sentence_transformers import SentenceTransformer
	sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
	instruction = "Represent the Science title; Input:"
	model = SentenceTransformer('hkunlp/instructor-base')
	embeddings = model.encode([[instruction,sentence,0]])
	print(embeddings)
	```

	## Calculate Sentence similarities
	You can further use the model to compute similarities between two groups of sentences, with customized embeddings.
	```python
	from sklearn.metrics.pairwise import cosine_similarity
	sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
	['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
	sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
	['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
	embeddings_a = model.encode(sentences_a)
	embeddings_b = model.encode(sentences_b)
	similarities = cosine_similarity(embeddings_a,embeddings_b)
	print(similarities)
	```