csukuangfj commited on
Commit
1ab59de
1 Parent(s): da1570d
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *.far filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *.far filter=lfs diff=lfs merge=lfs -text
37
+ *.utf8 filter=lfs diff=lfs merge=lfs -text
dict/README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CppJieba字典
2
+
3
+ 文件后缀名代表的是词典的编码方式。
4
+ 比如filename.utf8 是 utf8编码,filename.gbk 是 gbk编码方式。
5
+
6
+
7
+ ## 分词
8
+
9
+ ### jieba.dict.utf8/gbk
10
+
11
+ 作为最大概率法(MPSegment: Max Probability)分词所使用的词典。
12
+
13
+ ### hmm_model.utf8/gbk
14
+
15
+ 作为隐式马尔科夫模型(HMMSegment: Hidden Markov Model)分词所使用的词典。
16
+
17
+ __对于MixSegment(混合MPSegment和HMMSegment两者)则同时使用以上两个词典__
18
+
19
+
20
+ ## 关键词抽取
21
+
22
+ ### idf.utf8
23
+
24
+ IDF(Inverse Document Frequency)
25
+ 在KeywordExtractor中,使用的是经典的TF-IDF算法,所以需要这么一个词典提供IDF信息。
26
+
27
+ ### stop_words.utf8
28
+
29
+ 停用词词典
30
+
31
+
dict/hmm_model.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f17790586ac86dd048c8adffed052c4bd2b28ed0682972c1275e59040c0589a7
3
+ size 519739
dict/idf.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbd1e03d72b2263cc8d84a4304ed77677eed9e7deaf43a1a5133bbba9733b535
3
+ size 5998717
dict/jieba.dict.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3043b77068e09c9904f27cad82f12b6ebe9dbdb5aeff3b25e45ab7f9c1122b55
3
+ size 5071204
dict/pos_dict/char_state_tab.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28b7be1dd7369766a51445af4d42e9a2ba4bf374c13be5bc1ca7721e27271dbb
3
+ size 327139
dict/pos_dict/prob_emit.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c33c4cb7edf3b3a5947df7209b6e9f267eae1f21335d9e2bd2521ea07105457a
3
+ size 1687686
dict/pos_dict/prob_start.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:13623ea0e9300bdb597cb2da28770b7b385d6c0098d66e516083fb01b6bd5d96
3
+ size 4347
dict/pos_dict/prob_trans.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f22363e2307408293d180c6f9f6b5cb75879d52f722f7764fa2d3d0ae2400236
3
+ size 124159
dict/stop_words.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b788b8a939d2e2fe079abd579ea98f12f9fb84370bfd0dddd81bb9381f7ab42c
3
+ size 8974
dict/user.dict.utf8 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:495bbf49270408a1234690e1e6a97328f30a482a7a72aa769e8a12e8714b0c62
3
+ size 49