csukuangfj
commited on
Commit
•
1ab59de
1
Parent(s):
da1570d
update
Browse files- .gitattributes +1 -0
- dict/README.md +31 -0
- dict/hmm_model.utf8 +3 -0
- dict/idf.utf8 +3 -0
- dict/jieba.dict.utf8 +3 -0
- dict/pos_dict/char_state_tab.utf8 +3 -0
- dict/pos_dict/prob_emit.utf8 +3 -0
- dict/pos_dict/prob_start.utf8 +3 -0
- dict/pos_dict/prob_trans.utf8 +3 -0
- dict/stop_words.utf8 +3 -0
- dict/user.dict.utf8 +3 -0
.gitattributes
CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
*.far filter=lfs diff=lfs merge=lfs -text
|
|
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
*.far filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.utf8 filter=lfs diff=lfs merge=lfs -text
|
dict/README.md
ADDED
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CppJieba字典
|
2 |
+
|
3 |
+
文件后缀名代表的是词典的编码方式。
|
4 |
+
比如filename.utf8 是 utf8编码,filename.gbk 是 gbk编码方式。
|
5 |
+
|
6 |
+
|
7 |
+
## 分词
|
8 |
+
|
9 |
+
### jieba.dict.utf8/gbk
|
10 |
+
|
11 |
+
作为最大概率法(MPSegment: Max Probability)分词所使用的词典。
|
12 |
+
|
13 |
+
### hmm_model.utf8/gbk
|
14 |
+
|
15 |
+
作为隐式马尔科夫模型(HMMSegment: Hidden Markov Model)分词所使用的词典。
|
16 |
+
|
17 |
+
__对于MixSegment(混合MPSegment和HMMSegment两者)则同时使用以上两个词典__
|
18 |
+
|
19 |
+
|
20 |
+
## 关键词抽取
|
21 |
+
|
22 |
+
### idf.utf8
|
23 |
+
|
24 |
+
IDF(Inverse Document Frequency)
|
25 |
+
在KeywordExtractor中,使用的是经典的TF-IDF算法,所以需要这么一个词典提供IDF信息。
|
26 |
+
|
27 |
+
### stop_words.utf8
|
28 |
+
|
29 |
+
停用词词典
|
30 |
+
|
31 |
+
|
dict/hmm_model.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f17790586ac86dd048c8adffed052c4bd2b28ed0682972c1275e59040c0589a7
|
3 |
+
size 519739
|
dict/idf.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dbd1e03d72b2263cc8d84a4304ed77677eed9e7deaf43a1a5133bbba9733b535
|
3 |
+
size 5998717
|
dict/jieba.dict.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3043b77068e09c9904f27cad82f12b6ebe9dbdb5aeff3b25e45ab7f9c1122b55
|
3 |
+
size 5071204
|
dict/pos_dict/char_state_tab.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:28b7be1dd7369766a51445af4d42e9a2ba4bf374c13be5bc1ca7721e27271dbb
|
3 |
+
size 327139
|
dict/pos_dict/prob_emit.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c33c4cb7edf3b3a5947df7209b6e9f267eae1f21335d9e2bd2521ea07105457a
|
3 |
+
size 1687686
|
dict/pos_dict/prob_start.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:13623ea0e9300bdb597cb2da28770b7b385d6c0098d66e516083fb01b6bd5d96
|
3 |
+
size 4347
|
dict/pos_dict/prob_trans.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f22363e2307408293d180c6f9f6b5cb75879d52f722f7764fa2d3d0ae2400236
|
3 |
+
size 124159
|
dict/stop_words.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b788b8a939d2e2fe079abd579ea98f12f9fb84370bfd0dddd81bb9381f7ab42c
|
3 |
+
size 8974
|
dict/user.dict.utf8
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:495bbf49270408a1234690e1e6a97328f30a482a7a72aa769e8a12e8714b0c62
|
3 |
+
size 49
|