multi-train commited on
Commit
51e835d
1 Parent(s): 3f6f495

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -10,10 +10,12 @@ tags:
10
  ---
11
 
12
  # hkunlp/instructor-xl
13
- This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.)
14
-
15
  The model is easy to use with `sentence-transformer` library.
16
 
 
 
 
17
  ## Installation
18
  ```bash
19
  git clone https://github.com/HKUNLP/instructor-embedding
@@ -32,14 +34,25 @@ embeddings = model.encode([[instruction,sentence,0]])
32
  print(embeddings)
33
  ```
34
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Calculate Sentence similarities
36
  You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
37
  ```python
38
  from sklearn.metrics.pairwise import cosine_similarity
39
  sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
40
- ['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
41
  sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
42
- ['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
43
  embeddings_a = model.encode(sentences_a)
44
  embeddings_b = model.encode(sentences_b)
45
  similarities = cosine_similarity(embeddings_a,embeddings_b)
 
10
  ---
11
 
12
  # hkunlp/instructor-xl
13
+ We introduce **Instructor**👨‍🏫, an instruction-finetuned text embedding model that can generate text embeddings tailored to any task (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) ***by simply providing the task instruction, without any finetuning***. Instructor👨‍ achieves sota on 70 diverse embedding tasks!
 
14
  The model is easy to use with `sentence-transformer` library.
15
 
16
+ ## Quick start
17
+ <hr />
18
+
19
  ## Installation
20
  ```bash
21
  git clone https://github.com/HKUNLP/instructor-embedding
 
34
  print(embeddings)
35
  ```
36
 
37
+ ## Use cases
38
+ <hr />
39
+
40
+ ## Calculate embeddings for your customized texts
41
+ If you want to calculate customized embeddings for specific sentences, you may follow the unified template to write instructions:
42
+
43
+ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Represent the `domain` `text_type` for `task_objective`; Input:
44
+ * `domain` is optional, and it specifies the domain of the text, e.g., science, finance, medicine, etc.
45
+ * `text_type` is required, and it specifies the encoding unit, e.g., sentence, document, paragraph, etc.
46
+ * `task_objective` is optional, and it specifies the objective of emebdding, e.g., retrieve a document, classify the sentence, etc.
47
+
48
  ## Calculate Sentence similarities
49
  You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
50
  ```python
51
  from sklearn.metrics.pairwise import cosine_similarity
52
  sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
53
+ ['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]]
54
  sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
55
+ ['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]]
56
  embeddings_a = model.encode(sentences_a)
57
  embeddings_b = model.encode(sentences_b)
58
  similarities = cosine_similarity(embeddings_a,embeddings_b)