hellonlp commited on
Commit
5281659
1 Parent(s): 649b9ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -3,3 +3,49 @@ language:
3
  - zh
4
  license: mit
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - zh
4
  license: mit
5
  ---
6
+
7
+
8
+ ## Uses
9
+ You can use our model for encoding sentences into embeddings
10
+ ```python
11
+ import torch
12
+ from transformers import BertTokenizer
13
+ from transformers import BertModel
14
+ from sklearn.metrics.pairwise import cosine_similarity
15
+
16
+
17
+ # model
18
+ simcse_sup_path = "hellonlp/simcse-roberta-base-zh"
19
+ tokenizer = BertTokenizer.from_pretrained(simcse_sup_path)
20
+ MODEL = BertModel.from_pretrained(simcse_sup_path)
21
+
22
+
23
+ def get_vector_simcse(sentence):
24
+ """
25
+ 预测simcse的语义向量。
26
+ """
27
+ input_ids = torch.tensor(tokenizer.encode(sentence)).unsqueeze(0)
28
+ output = MODEL(input_ids)
29
+ return output.last_hidden_state[:, 0].squeeze(0)
30
+
31
+
32
+ embeddings = get_vector_simcse("武汉是一个美丽的城市。")
33
+ print(embeddings.shape)
34
+ #torch.Size([768])
35
+ ```
36
+
37
+ You can also compute the cosine similarities between two groups of sentences
38
+ ```python
39
+
40
+ def get_similarity_two(sentence1, sentence2):
41
+ vec1 = get_vector_simcse(sentence1).tolist()
42
+ vec2 = get_vector_simcse(sentence2).tolist()
43
+ similarity_list = cosine_similarity([vec1], [vec2]).tolist()[0][0]
44
+ return similarity_list
45
+
46
+ sentence1 = '你好吗'
47
+ sentence2 = '你还好吗'
48
+ result = get_similarity_two(sentence1,sentence2)
49
+ print(result)
50
+ #0.848331
51
+ ```