--- pipeline_tag: sentence-similarity language: en license: apache-2.0 tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers --- # hku-nlp/instructor-base This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.) The model is easy to use with `sentence-transformer` library. ## Installation ```bash git clone https://github.com/HKUNLP/instructor-embedding cd sentence-transformers pip install -e . ``` ## Compute your customized embeddings Then you can use the model like this to calculate domain-specific and task-aware embeddings: ```python from sentence_transformers import SentenceTransformer sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments" instruction = "Represent the Science title; Input:" model = SentenceTransformer('hku-nlp/instructor-base') embeddings = model.encode([[instruction,sentence,0]]) print(embeddings) ``` ## Calculate Sentence similarities You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**. ```python from sklearn.metrics.pairwise import cosine_similarity sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0], ['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0] sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0], ['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0] embeddings_a = model.encode(sentences_a) embeddings_b = model.encode(sentences_b) similarities = cosine_similarity(embeddings_a,embeddings_b) print(similarities) ```