raynardj commited on
Commit
5c4a81e
1 Parent(s): 7fa95d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -1
README.md CHANGED
@@ -14,4 +14,40 @@ tags:
14
  * This model helps you **find** text within **ancient Chinese** literature, but you can **search with modern Chinese**
15
 
16
  # 跨语种搜索
17
- ## 博古搜今
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  * This model helps you **find** text within **ancient Chinese** literature, but you can **search with modern Chinese**
15
 
16
  # 跨语种搜索
17
+ ## 博古搜今
18
+ ```python
19
+ from unpackai.interp import CosineSearch
20
+ from sentence_transformers import SentenceTransformer
21
+ import pandas as pd
22
+ import numpy as np
23
+
24
+ TAG = "raynardj/xlsearch-cross-lang-search-zh-vs-classicical-cn"
25
+ encoder = SentenceTransformer(TAG)
26
+
27
+ # all_lines is a list of all your sentences
28
+ # all_lines 是一个你所有句子的列表, 可以是一本书, 按照句子分割, 也可以是很多很多书
29
+ all_lines = ["句子1","句子2",...]
30
+ vec = encoder.encode(all_lines, batch_size=32, show_progress_bar=True)
31
+
32
+ # consine距离搜索器
33
+ cosine = CosineSearch(vec)
34
+
35
+ def search(text):
36
+ enc = encoder.encode(text) # encode the search key
37
+ order = cosine(enc) # distance array
38
+ sentence_df = pd.DataFrame({"sentence":np.array(all_lines)[order[:5]]})
39
+ return sentence_df
40
+ ```
41
+
42
+ 将史记打成句子以后, 搜索效果如下
43
+ ```python
44
+ >>> search("他是一个很慷慨的人")
45
+ ```
46
+ ```
47
+ sentence
48
+ 0 季布者,楚人也。为气任侠,有名於楚。
49
+ 1 董仲舒为人廉直。
50
+ 2 大将军为人仁善退让,以和柔自媚於上,然天下未有称也。
51
+ 3 勃为人木彊敦厚,高帝以为可属大事。
52
+ 4 石奢者,楚昭王相也。坚直廉正,无所阿避。
53
+ ```