BM-K commited on
Commit
ca2daf1
โ€ข
1 Parent(s): 4d4d710

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ https://github.com/BM-K/Sentence-Embedding-is-all-you-need
2
+
3
+ # Korean-Sentence-Embedding
4
+ ๐Ÿญ Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides environments where individuals can train models.
5
+
6
+ ## Quick tour
7
+ ```python
8
+ import torch
9
+ from transformers import AutoModel, AutoTokenizer
10
+
11
+ def cal_score(a, b):
12
+ if len(a.shape) == 1: a = a.unsqueeze(0)
13
+ if len(b.shape) == 1: b = b.unsqueeze(0)
14
+
15
+ a_norm = a / a.norm(dim=1)[:, None]
16
+ b_norm = b / b.norm(dim=1)[:, None]
17
+ return torch.mm(a_norm, b_norm.transpose(0, 1)) * 100
18
+
19
+ model = AutoModel.from_pretrained('BM-K/KoSimCSE-roberta-multitask')
20
+ AutoTokenizer.from_pretrained('BM-K/KoSimCSE-roberta-multitask')
21
+
22
+ sentences = ['์น˜ํƒ€๊ฐ€ ๋“คํŒ์„ ๊ฐ€๋กœ ์งˆ๋Ÿฌ ๋จน์ด๋ฅผ ์ซ“๋Š”๋‹ค.',
23
+ '์น˜ํƒ€ ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋จน์ด ๋’ค์—์„œ ๋‹ฌ๋ฆฌ๊ณ  ์žˆ๋‹ค.',
24
+ '์›์ˆญ์ด ํ•œ ๋งˆ๋ฆฌ๊ฐ€ ๋“œ๋Ÿผ์„ ์—ฐ์ฃผํ•œ๋‹ค.']
25
+
26
+ inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
27
+ embeddings, _ = model(**inputs, return_dict=False)
28
+
29
+ score01 = cal_score(embeddings[0][0], embeddings[1][0])
30
+ score02 = cal_score(embeddings[0][0], embeddings[2][0])
31
+ ```