osanseviero HF staff commited on
Commit
c4017fc
β€’
1 Parent(s): e35ed0c

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -14,3 +14,7 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ *.whl filter=lfs diff=lfs merge=lfs -text
18
+ *.npz filter=lfs diff=lfs merge=lfs -text
19
+ *strings.json filter=lfs diff=lfs merge=lfs -text
20
+ vectors filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2021 ExplosionAI GmbH
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the "Software"), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7
+ of the Software, and to permit persons to whom the Software is furnished to do
8
+ so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
LICENSES_SOURCES ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OntoNotes 5
2
+
3
+ * Author: Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston
4
+ * URL: https://catalog.ldc.upenn.edu/LDC2013T19
5
+ * License: commercial (licensed by Explosion)
6
+
7
+ ```
8
+ ```
9
+
10
+
11
+
12
+
13
+ # CoreNLP Universal Dependencies Converter
14
+
15
+ * Author: Stanford NLP Group
16
+ * URL: https://nlp.stanford.edu/software/stanford-dependencies.html
17
+ * License: Citation provided for reference, no code packaged with model
18
+
19
+ ```
20
+ ```
21
+
22
+
23
+
24
+
25
+ # bert-base-chinese
26
+
27
+ * Author: Hugging Face
28
+ * URL: https://huggingface.co/bert-base-chinese
29
+ * License:
30
+
31
+ ```
32
+ ```
33
+
34
+
35
+
36
+
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - spacy
4
+ - token-classification
5
+ language:
6
+ - zh
7
+ license: MIT
8
+ model-index:
9
+ - name: zh_core_web_trf
10
+ results:
11
+ - tasks:
12
+ name: NER
13
+ type: token-classification
14
+ metrics:
15
+ - name: Precision
16
+ type: precision
17
+ value: 0.6744976507
18
+ - name: Recall
19
+ type: recall
20
+ value: 0.7414285714
21
+ - name: F Score
22
+ type: f_score
23
+ value: 0.7063811967
24
+ - tasks:
25
+ name: POS
26
+ type: token-classification
27
+ metrics:
28
+ - name: Accuracy
29
+ type: accuracy
30
+ value: 0.92444533
31
+ - tasks:
32
+ name: SENTER
33
+ type: token-classification
34
+ metrics:
35
+ - name: Precision
36
+ type: precision
37
+ value: 0.6917940608
38
+ - name: Recall
39
+ type: recall
40
+ value: 0.655402031
41
+ - name: F Score
42
+ type: f_score
43
+ value: 0.6731065139
44
+ - tasks:
45
+ name: UNLABELED_DEPENDENCIES
46
+ type: token-classification
47
+ metrics:
48
+ - name: Accuracy
49
+ type: accuracy
50
+ value: 0.7661671924
51
+ - tasks:
52
+ name: LABELED_DEPENDENCIES
53
+ type: token-classification
54
+ metrics:
55
+ - name: Accuracy
56
+ type: accuracy
57
+ value: 0.7661671924
58
+ ---
59
+ ### Details: https://spacy.io/models/zh#zh_core_web_trf
60
+
61
+ Chinese transformer pipeline (bert-base-chinese). Components: transformer, tagger, parser, ner, attribute_ruler.
62
+
63
+ | Feature | Description |
64
+ | --- | --- |
65
+ | **Name** | `zh_core_web_trf` |
66
+ | **Version** | `3.1.0` |
67
+ | **spaCy** | `>=3.1.0,<3.2.0` |
68
+ | **Default Pipeline** | `transformer`, `tagger`, `parser`, `attribute_ruler`, `ner` |
69
+ | **Components** | `transformer`, `tagger`, `parser`, `attribute_ruler`, `ner` |
70
+ | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
71
+ | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br />[CoreNLP Universal Dependencies Converter](https://nlp.stanford.edu/software/stanford-dependencies.html) (Stanford NLP Group)<br />[bert-base-chinese](https://huggingface.co/bert-base-chinese) (Hugging Face) |
72
+ | **License** | `MIT` |
73
+ | **Author** | [Explosion](https://explosion.ai) |
74
+
75
+ ### Label Scheme
76
+
77
+ <details>
78
+
79
+ <summary>View label scheme (99 labels for 3 components)</summary>
80
+
81
+ | Component | Labels |
82
+ | --- | --- |
83
+ | **`tagger`** | `AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X` |
84
+ | **`parser`** | `ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp` |
85
+ | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
86
+
87
+ </details>
88
+
89
+ ### Accuracy
90
+
91
+ | Type | Score |
92
+ | --- | --- |
93
+ | `TOKEN_ACC` | 97.88 |
94
+ | `TAG_ACC` | 92.44 |
95
+ | `DEP_UAS` | 76.62 |
96
+ | `DEP_LAS` | 72.80 |
97
+ | `ENTS_P` | 67.45 |
98
+ | `ENTS_R` | 74.14 |
99
+ | `ENTS_F` | 70.64 |
100
+ | `SENTS_P` | 69.18 |
101
+ | `SENTS_R` | 65.54 |
102
+ | `SENTS_F` | 67.31 |
accuracy.json ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "token_acc": 0.9788303388,
3
+ "tag_acc": 0.92444533,
4
+ "dep_uas": 0.7661671924,
5
+ "dep_las": 0.727981345,
6
+ "ents_p": 0.6744976507,
7
+ "ents_r": 0.7414285714,
8
+ "ents_f": 0.7063811967,
9
+ "sents_p": 0.6917940608,
10
+ "sents_r": 0.655402031,
11
+ "sents_f": 0.6731065139,
12
+ "speed": 4304.7686585922,
13
+ "dep_las_per_type": {
14
+ "dep": {
15
+ "p": 0.5611908839,
16
+ "r": 0.4304334647,
17
+ "f": 0.4871912168
18
+ },
19
+ "case": {
20
+ "p": 0.9069435432,
21
+ "r": 0.8472356935,
22
+ "f": 0.8760734658
23
+ },
24
+ "nmod:tmod": {
25
+ "p": 0.8046448087,
26
+ "r": 0.8013605442,
27
+ "f": 0.8029993183
28
+ },
29
+ "nummod": {
30
+ "p": 0.9012345679,
31
+ "r": 0.583610926,
32
+ "f": 0.7084512738
33
+ },
34
+ "mark:clf": {
35
+ "p": 0.9517326733,
36
+ "r": 0.5736665423,
37
+ "f": 0.7158482662
38
+ },
39
+ "auxpass": {
40
+ "p": 0.9293478261,
41
+ "r": 0.9243243243,
42
+ "f": 0.9268292683
43
+ },
44
+ "nsubj": {
45
+ "p": 0.8733342307,
46
+ "r": 0.7969536912,
47
+ "f": 0.8333975594
48
+ },
49
+ "acl": {
50
+ "p": 0.8216805645,
51
+ "r": 0.7104825291,
52
+ "f": 0.762046401
53
+ },
54
+ "advmod": {
55
+ "p": 0.8829383266,
56
+ "r": 0.7643391521,
57
+ "f": 0.819369342
58
+ },
59
+ "mark": {
60
+ "p": 0.865225391,
61
+ "r": 0.82427695,
62
+ "f": 0.8442549372
63
+ },
64
+ "xcomp": {
65
+ "p": 0.8221343874,
66
+ "r": 0.67752443,
67
+ "f": 0.7428571429
68
+ },
69
+ "nmod:assmod": {
70
+ "p": 0.8709787817,
71
+ "r": 0.7920946156,
72
+ "f": 0.8296658517
73
+ },
74
+ "det": {
75
+ "p": 0.9033306255,
76
+ "r": 0.6514352665,
77
+ "f": 0.7569775357
78
+ },
79
+ "amod": {
80
+ "p": 0.8509749304,
81
+ "r": 0.7199528672,
82
+ "f": 0.78
83
+ },
84
+ "nmod:prep": {
85
+ "p": 0.8192449048,
86
+ "r": 0.7416817907,
87
+ "f": 0.7785362756
88
+ },
89
+ "root": {
90
+ "p": 0.7726093403,
91
+ "r": 0.6940236391,
92
+ "f": 0.7312110848
93
+ },
94
+ "aux:prtmod": {
95
+ "p": 0.9042145594,
96
+ "r": 0.8428571429,
97
+ "f": 0.8724584104
98
+ },
99
+ "compound:nn": {
100
+ "p": 0.8040945994,
101
+ "r": 0.7708967851,
102
+ "f": 0.7871458189
103
+ },
104
+ "dobj": {
105
+ "p": 0.9081300813,
106
+ "r": 0.8272848467,
107
+ "f": 0.8658243547
108
+ },
109
+ "ccomp": {
110
+ "p": 0.7877000842,
111
+ "r": 0.7270606532,
112
+ "f": 0.7561665993
113
+ },
114
+ "advmod:rcomp": {
115
+ "p": 0.8475609756,
116
+ "r": 0.7700831025,
117
+ "f": 0.8069666183
118
+ },
119
+ "nmod:topic": {
120
+ "p": 0.5434782609,
121
+ "r": 0.487012987,
122
+ "f": 0.5136986301
123
+ },
124
+ "cop": {
125
+ "p": 0.8351555929,
126
+ "r": 0.638996139,
127
+ "f": 0.7240247904
128
+ },
129
+ "discourse": {
130
+ "p": 0.6153846154,
131
+ "r": 0.5478547855,
132
+ "f": 0.5796595373
133
+ },
134
+ "neg": {
135
+ "p": 0.8932496075,
136
+ "r": 0.6765755054,
137
+ "f": 0.7699594046
138
+ },
139
+ "aux:modal": {
140
+ "p": 0.9019823789,
141
+ "r": 0.8469493278,
142
+ "f": 0.8736
143
+ },
144
+ "nmod": {
145
+ "p": 0.7988505747,
146
+ "r": 0.7544097693,
147
+ "f": 0.7759944173
148
+ },
149
+ "aux:ba": {
150
+ "p": 0.9265536723,
151
+ "r": 0.8723404255,
152
+ "f": 0.898630137
153
+ },
154
+ "advmod:loc": {
155
+ "p": 0.80859375,
156
+ "r": 0.6142433234,
157
+ "f": 0.6981450253
158
+ },
159
+ "aux:asp": {
160
+ "p": 0.9320882852,
161
+ "r": 0.8755980861,
162
+ "f": 0.9029605263
163
+ },
164
+ "conj": {
165
+ "p": 0.6313854489,
166
+ "r": 0.6168241966,
167
+ "f": 0.6240198891
168
+ },
169
+ "nsubjpass": {
170
+ "p": 0.8913043478,
171
+ "r": 0.82,
172
+ "f": 0.8541666667
173
+ },
174
+ "compound:vc": {
175
+ "p": 0.5459183673,
176
+ "r": 0.5544041451,
177
+ "f": 0.5501285347
178
+ },
179
+ "advcl:loc": {
180
+ "p": 0.7586206897,
181
+ "r": 0.6285714286,
182
+ "f": 0.6875
183
+ },
184
+ "cc": {
185
+ "p": 0.7972477064,
186
+ "r": 0.7710736469,
187
+ "f": 0.7839422643
188
+ },
189
+ "advmod:dvp": {
190
+ "p": 0.9076923077,
191
+ "r": 0.7329192547,
192
+ "f": 0.8109965636
193
+ },
194
+ "amod:ordmod": {
195
+ "p": 0.6666666667,
196
+ "r": 0.59375,
197
+ "f": 0.6280991736
198
+ },
199
+ "appos": {
200
+ "p": 0.9434889435,
201
+ "r": 0.8827586207,
202
+ "f": 0.9121140143
203
+ },
204
+ "nmod:poss": {
205
+ "p": 0.7647058824,
206
+ "r": 0.6740740741,
207
+ "f": 0.7165354331
208
+ },
209
+ "name": {
210
+ "p": 0.6097560976,
211
+ "r": 0.5555555556,
212
+ "f": 0.5813953488
213
+ },
214
+ "nsubj:xsubj": {
215
+ "p": 0.5,
216
+ "r": 0.1,
217
+ "f": 0.1666666667
218
+ },
219
+ "nmod:range": {
220
+ "p": 0.8295454545,
221
+ "r": 0.7348993289,
222
+ "f": 0.7793594306
223
+ },
224
+ "parataxis:prnmod": {
225
+ "p": 0.3663366337,
226
+ "r": 0.2781954887,
227
+ "f": 0.3162393162
228
+ },
229
+ "erased": {
230
+ "p": 0.0,
231
+ "r": 0.0,
232
+ "f": 0.0
233
+ },
234
+ "etc": {
235
+ "p": 0.9285714286,
236
+ "r": 0.9285714286,
237
+ "f": 0.9285714286
238
+ }
239
+ },
240
+ "ents_per_type": {
241
+ "DATE": {
242
+ "p": 0.6931530008,
243
+ "r": 0.8126858276,
244
+ "f": 0.7481751825
245
+ },
246
+ "GPE": {
247
+ "p": 0.792633015,
248
+ "r": 0.8519061584,
249
+ "f": 0.8212014134
250
+ },
251
+ "CARDINAL": {
252
+ "p": 0.5527502254,
253
+ "r": 0.6179435484,
254
+ "f": 0.5835316516
255
+ },
256
+ "ORDINAL": {
257
+ "p": 0.8287292818,
258
+ "r": 0.7894736842,
259
+ "f": 0.8086253369
260
+ },
261
+ "FAC": {
262
+ "p": 0.5301204819,
263
+ "r": 0.4731182796,
264
+ "f": 0.5
265
+ },
266
+ "ORG": {
267
+ "p": 0.7304479879,
268
+ "r": 0.7321156773,
269
+ "f": 0.7312808818
270
+ },
271
+ "LOC": {
272
+ "p": 0.1899383984,
273
+ "r": 0.497311828,
274
+ "f": 0.2748885587
275
+ },
276
+ "NORP": {
277
+ "p": 0.6797900262,
278
+ "r": 0.5441176471,
279
+ "f": 0.6044340723
280
+ },
281
+ "QUANTITY": {
282
+ "p": 0.7363636364,
283
+ "r": 0.6,
284
+ "f": 0.6612244898
285
+ },
286
+ "PERSON": {
287
+ "p": 0.8649842271,
288
+ "r": 0.8833762887,
289
+ "f": 0.8740835193
290
+ },
291
+ "TIME": {
292
+ "p": 0.711627907,
293
+ "r": 0.7427184466,
294
+ "f": 0.7268408551
295
+ },
296
+ "WORK_OF_ART": {
297
+ "p": 0.1849315068,
298
+ "r": 0.18,
299
+ "f": 0.1824324324
300
+ },
301
+ "MONEY": {
302
+ "p": 0.8682170543,
303
+ "r": 0.8296296296,
304
+ "f": 0.8484848485
305
+ },
306
+ "EVENT": {
307
+ "p": 0.5804195804,
308
+ "r": 0.6102941176,
309
+ "f": 0.5949820789
310
+ },
311
+ "PERCENT": {
312
+ "p": 0.7640449438,
313
+ "r": 0.8192771084,
314
+ "f": 0.7906976744
315
+ },
316
+ "PRODUCT": {
317
+ "p": 0.5384615385,
318
+ "r": 0.1428571429,
319
+ "f": 0.2258064516
320
+ },
321
+ "LAW": {
322
+ "p": 0.3076923077,
323
+ "r": 0.2666666667,
324
+ "f": 0.2857142857
325
+ },
326
+ "LANGUAGE": {
327
+ "p": 0.8181818182,
328
+ "r": 1.0,
329
+ "f": 0.9
330
+ }
331
+ }
332
+ }
attribute_ruler/patterns ADDED
Binary file (1.93 kB). View file
config.cfg ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [paths]
2
+ train = "corpus/zh-core-news/train.spacy"
3
+ dev = "corpus/zh-core-news/dev.spacy"
4
+ vectors = null
5
+ raw = null
6
+ init_tok2vec = null
7
+ vocab_data = null
8
+
9
+ [system]
10
+ gpu_allocator = "pytorch"
11
+ seed = 1
12
+
13
+ [nlp]
14
+ lang = "zh"
15
+ pipeline = ["transformer","tagger","parser","attribute_ruler","ner"]
16
+ disabled = []
17
+ before_creation = null
18
+ after_creation = null
19
+ after_pipeline_creation = null
20
+ batch_size = 64
21
+
22
+ [nlp.tokenizer]
23
+ @tokenizers = "spacy.zh.ChineseTokenizer"
24
+ segmenter = "pkuseg"
25
+
26
+ [components]
27
+
28
+ [components.attribute_ruler]
29
+ factory = "attribute_ruler"
30
+ validate = false
31
+
32
+ [components.ner]
33
+ factory = "ner"
34
+ incorrect_spans_key = null
35
+ moves = null
36
+ update_with_oracle_cut_size = 100
37
+
38
+ [components.ner.model]
39
+ @architectures = "spacy.TransitionBasedParser.v2"
40
+ state_type = "ner"
41
+ extra_state_tokens = false
42
+ hidden_width = 64
43
+ maxout_pieces = 2
44
+ use_upper = false
45
+ nO = null
46
+
47
+ [components.ner.model.tok2vec]
48
+ @architectures = "spacy-transformers.TransformerListener.v1"
49
+ grad_factor = 1.0
50
+ upstream = "transformer"
51
+ pooling = {"@layers":"reduce_mean.v1"}
52
+
53
+ [components.parser]
54
+ factory = "parser"
55
+ learn_tokens = false
56
+ min_action_freq = 30
57
+ moves = null
58
+ update_with_oracle_cut_size = 100
59
+
60
+ [components.parser.model]
61
+ @architectures = "spacy.TransitionBasedParser.v2"
62
+ state_type = "parser"
63
+ extra_state_tokens = false
64
+ hidden_width = 64
65
+ maxout_pieces = 2
66
+ use_upper = false
67
+ nO = null
68
+
69
+ [components.parser.model.tok2vec]
70
+ @architectures = "spacy-transformers.TransformerListener.v1"
71
+ grad_factor = 1.0
72
+ upstream = "transformer"
73
+ pooling = {"@layers":"reduce_mean.v1"}
74
+
75
+ [components.tagger]
76
+ factory = "tagger"
77
+
78
+ [components.tagger.model]
79
+ @architectures = "spacy.Tagger.v1"
80
+ nO = null
81
+
82
+ [components.tagger.model.tok2vec]
83
+ @architectures = "spacy-transformers.TransformerListener.v1"
84
+ grad_factor = 1.0
85
+ upstream = "transformer"
86
+ pooling = {"@layers":"reduce_mean.v1"}
87
+
88
+ [components.transformer]
89
+ factory = "transformer"
90
+ max_batch_items = 4096
91
+ set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
92
+
93
+ [components.transformer.model]
94
+ @architectures = "spacy-transformers.TransformerModel.v1"
95
+ name = "bert-base-chinese"
96
+
97
+ [components.transformer.model.get_spans]
98
+ @span_getters = "spacy-transformers.strided_spans.v1"
99
+ window = 128
100
+ stride = 96
101
+
102
+ [components.transformer.model.tokenizer_config]
103
+ use_fast = true
104
+
105
+ [corpora]
106
+
107
+ [corpora.dev]
108
+ @readers = "spacy.Corpus.v1"
109
+ limit = 0
110
+ max_length = 0
111
+ path = ${paths:dev}
112
+ gold_preproc = false
113
+ augmenter = null
114
+
115
+ [corpora.train]
116
+ @readers = "spacy.Corpus.v1"
117
+ path = ${paths:train}
118
+ max_length = 500
119
+ gold_preproc = false
120
+ limit = 0
121
+ augmenter = null
122
+
123
+ [training]
124
+ train_corpus = "corpora.train"
125
+ dev_corpus = "corpora.dev"
126
+ seed = ${system:seed}
127
+ gpu_allocator = ${system:gpu_allocator}
128
+ dropout = 0.1
129
+ accumulate_gradient = 3
130
+ patience = 5000
131
+ max_epochs = 0
132
+ max_steps = 20000
133
+ eval_frequency = 1000
134
+ frozen_components = []
135
+ before_to_disk = null
136
+ annotating_components = []
137
+
138
+ [training.batcher]
139
+ @batchers = "spacy.batch_by_padded.v1"
140
+ discard_oversize = true
141
+ get_length = null
142
+ size = 2000
143
+ buffer = 256
144
+
145
+ [training.logger]
146
+ @loggers = "spacy.ConsoleLogger.v1"
147
+ progress_bar = false
148
+
149
+ [training.optimizer]
150
+ @optimizers = "Adam.v1"
151
+ beta1 = 0.9
152
+ beta2 = 0.999
153
+ L2_is_weight_decay = true
154
+ L2 = 0.01
155
+ grad_clip = 1.0
156
+ use_averages = true
157
+ eps = 0.00000001
158
+
159
+ [training.optimizer.learn_rate]
160
+ @schedules = "warmup_linear.v1"
161
+ warmup_steps = 250
162
+ total_steps = 20000
163
+ initial_rate = 0.00005
164
+
165
+ [training.score_weights]
166
+ tag_acc = 0.32
167
+ dep_uas = 0.0
168
+ dep_las = 0.32
169
+ dep_las_per_type = null
170
+ sents_p = null
171
+ sents_r = null
172
+ sents_f = 0.04
173
+ ents_f = 0.32
174
+ ents_p = 0.0
175
+ ents_r = 0.0
176
+ ents_per_type = null
177
+
178
+ [pretraining]
179
+
180
+ [initialize]
181
+ vocab_data = ${paths.vocab_data}
182
+ vectors = ${paths.vectors}
183
+ init_tok2vec = ${paths.init_tok2vec}
184
+ before_init = null
185
+ after_init = null
186
+
187
+ [initialize.components]
188
+
189
+ [initialize.components.ner]
190
+
191
+ [initialize.components.ner.labels]
192
+ @readers = "spacy.read_labels.v1"
193
+ path = "corpus/labels/ner.json"
194
+ require = false
195
+
196
+ [initialize.components.parser]
197
+
198
+ [initialize.components.parser.labels]
199
+ @readers = "spacy.read_labels.v1"
200
+ path = "corpus/labels/parser.json"
201
+ require = false
202
+
203
+ [initialize.components.tagger]
204
+
205
+ [initialize.components.tagger.labels]
206
+ @readers = "spacy.read_labels.v1"
207
+ path = "corpus/labels/tagger.json"
208
+ require = false
209
+
210
+ [initialize.lookups]
211
+ @misc = "spacy.LookupsDataLoader.v1"
212
+ lang = ${nlp.lang}
213
+ tables = []
214
+
215
+ [initialize.tokenizer]
216
+ pkuseg_model = "assets/pkuseg_model"
217
+ pkuseg_user_dict = "default"
meta.json ADDED
@@ -0,0 +1,504 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lang":"zh",
3
+ "name":"core_web_trf",
4
+ "version":"3.1.0",
5
+ "description":"Chinese transformer pipeline (bert-base-chinese). Components: transformer, tagger, parser, ner, attribute_ruler.",
6
+ "author":"Explosion",
7
+ "email":"contact@explosion.ai",
8
+ "url":"https://explosion.ai",
9
+ "license":"MIT",
10
+ "spacy_version":">=3.1.0,<3.2.0",
11
+ "spacy_git_version":"caba63b74",
12
+ "vectors":{
13
+ "width":0,
14
+ "vectors":0,
15
+ "keys":0,
16
+ "name":null
17
+ },
18
+ "labels":{
19
+ "transformer":[
20
+
21
+ ],
22
+ "tagger":[
23
+ "AD",
24
+ "AS",
25
+ "BA",
26
+ "CC",
27
+ "CD",
28
+ "CS",
29
+ "DEC",
30
+ "DEG",
31
+ "DER",
32
+ "DEV",
33
+ "DT",
34
+ "ETC",
35
+ "FW",
36
+ "IJ",
37
+ "INF",
38
+ "JJ",
39
+ "LB",
40
+ "LC",
41
+ "M",
42
+ "MSP",
43
+ "NN",
44
+ "NR",
45
+ "NT",
46
+ "OD",
47
+ "ON",
48
+ "P",
49
+ "PN",
50
+ "PU",
51
+ "SB",
52
+ "SP",
53
+ "URL",
54
+ "VA",
55
+ "VC",
56
+ "VE",
57
+ "VV",
58
+ "X"
59
+ ],
60
+ "parser":[
61
+ "ROOT",
62
+ "acl",
63
+ "advcl:loc",
64
+ "advmod",
65
+ "advmod:dvp",
66
+ "advmod:loc",
67
+ "advmod:rcomp",
68
+ "amod",
69
+ "amod:ordmod",
70
+ "appos",
71
+ "aux:asp",
72
+ "aux:ba",
73
+ "aux:modal",
74
+ "aux:prtmod",
75
+ "auxpass",
76
+ "case",
77
+ "cc",
78
+ "ccomp",
79
+ "compound:nn",
80
+ "compound:vc",
81
+ "conj",
82
+ "cop",
83
+ "dep",
84
+ "det",
85
+ "discourse",
86
+ "dobj",
87
+ "etc",
88
+ "mark",
89
+ "mark:clf",
90
+ "name",
91
+ "neg",
92
+ "nmod",
93
+ "nmod:assmod",
94
+ "nmod:poss",
95
+ "nmod:prep",
96
+ "nmod:range",
97
+ "nmod:tmod",
98
+ "nmod:topic",
99
+ "nsubj",
100
+ "nsubj:xsubj",
101
+ "nsubjpass",
102
+ "nummod",
103
+ "parataxis:prnmod",
104
+ "punct",
105
+ "xcomp"
106
+ ],
107
+ "attribute_ruler":[
108
+
109
+ ],
110
+ "ner":[
111
+ "CARDINAL",
112
+ "DATE",
113
+ "EVENT",
114
+ "FAC",
115
+ "GPE",
116
+ "LANGUAGE",
117
+ "LAW",
118
+ "LOC",
119
+ "MONEY",
120
+ "NORP",
121
+ "ORDINAL",
122
+ "ORG",
123
+ "PERCENT",
124
+ "PERSON",
125
+ "PRODUCT",
126
+ "QUANTITY",
127
+ "TIME",
128
+ "WORK_OF_ART"
129
+ ]
130
+ },
131
+ "pipeline":[
132
+ "transformer",
133
+ "tagger",
134
+ "parser",
135
+ "attribute_ruler",
136
+ "ner"
137
+ ],
138
+ "components":[
139
+ "transformer",
140
+ "tagger",
141
+ "parser",
142
+ "attribute_ruler",
143
+ "ner"
144
+ ],
145
+ "disabled":[
146
+
147
+ ],
148
+ "performance":{
149
+ "token_acc":0.9788303388,
150
+ "tag_acc":0.92444533,
151
+ "dep_uas":0.7661671924,
152
+ "dep_las":0.727981345,
153
+ "ents_p":0.6744976507,
154
+ "ents_r":0.7414285714,
155
+ "ents_f":0.7063811967,
156
+ "sents_p":0.6917940608,
157
+ "sents_r":0.655402031,
158
+ "sents_f":0.6731065139,
159
+ "speed":4304.7686585922,
160
+ "dep_las_per_type":{
161
+ "dep":{
162
+ "p":0.5611908839,
163
+ "r":0.4304334647,
164
+ "f":0.4871912168
165
+ },
166
+ "case":{
167
+ "p":0.9069435432,
168
+ "r":0.8472356935,
169
+ "f":0.8760734658
170
+ },
171
+ "nmod:tmod":{
172
+ "p":0.8046448087,
173
+ "r":0.8013605442,
174
+ "f":0.8029993183
175
+ },
176
+ "nummod":{
177
+ "p":0.9012345679,
178
+ "r":0.583610926,
179
+ "f":0.7084512738
180
+ },
181
+ "mark:clf":{
182
+ "p":0.9517326733,
183
+ "r":0.5736665423,
184
+ "f":0.7158482662
185
+ },
186
+ "auxpass":{
187
+ "p":0.9293478261,
188
+ "r":0.9243243243,
189
+ "f":0.9268292683
190
+ },
191
+ "nsubj":{
192
+ "p":0.8733342307,
193
+ "r":0.7969536912,
194
+ "f":0.8333975594
195
+ },
196
+ "acl":{
197
+ "p":0.8216805645,
198
+ "r":0.7104825291,
199
+ "f":0.762046401
200
+ },
201
+ "advmod":{
202
+ "p":0.8829383266,
203
+ "r":0.7643391521,
204
+ "f":0.819369342
205
+ },
206
+ "mark":{
207
+ "p":0.865225391,
208
+ "r":0.82427695,
209
+ "f":0.8442549372
210
+ },
211
+ "xcomp":{
212
+ "p":0.8221343874,
213
+ "r":0.67752443,
214
+ "f":0.7428571429
215
+ },
216
+ "nmod:assmod":{
217
+ "p":0.8709787817,
218
+ "r":0.7920946156,
219
+ "f":0.8296658517
220
+ },
221
+ "det":{
222
+ "p":0.9033306255,
223
+ "r":0.6514352665,
224
+ "f":0.7569775357
225
+ },
226
+ "amod":{
227
+ "p":0.8509749304,
228
+ "r":0.7199528672,
229
+ "f":0.78
230
+ },
231
+ "nmod:prep":{
232
+ "p":0.8192449048,
233
+ "r":0.7416817907,
234
+ "f":0.7785362756
235
+ },
236
+ "root":{
237
+ "p":0.7726093403,
238
+ "r":0.6940236391,
239
+ "f":0.7312110848
240
+ },
241
+ "aux:prtmod":{
242
+ "p":0.9042145594,
243
+ "r":0.8428571429,
244
+ "f":0.8724584104
245
+ },
246
+ "compound:nn":{
247
+ "p":0.8040945994,
248
+ "r":0.7708967851,
249
+ "f":0.7871458189
250
+ },
251
+ "dobj":{
252
+ "p":0.9081300813,
253
+ "r":0.8272848467,
254
+ "f":0.8658243547
255
+ },
256
+ "ccomp":{
257
+ "p":0.7877000842,
258
+ "r":0.7270606532,
259
+ "f":0.7561665993
260
+ },
261
+ "advmod:rcomp":{
262
+ "p":0.8475609756,
263
+ "r":0.7700831025,
264
+ "f":0.8069666183
265
+ },
266
+ "nmod:topic":{
267
+ "p":0.5434782609,
268
+ "r":0.487012987,
269
+ "f":0.5136986301
270
+ },
271
+ "cop":{
272
+ "p":0.8351555929,
273
+ "r":0.638996139,
274
+ "f":0.7240247904
275
+ },
276
+ "discourse":{
277
+ "p":0.6153846154,
278
+ "r":0.5478547855,
279
+ "f":0.5796595373
280
+ },
281
+ "neg":{
282
+ "p":0.8932496075,
283
+ "r":0.6765755054,
284
+ "f":0.7699594046
285
+ },
286
+ "aux:modal":{
287
+ "p":0.9019823789,
288
+ "r":0.8469493278,
289
+ "f":0.8736
290
+ },
291
+ "nmod":{
292
+ "p":0.7988505747,
293
+ "r":0.7544097693,
294
+ "f":0.7759944173
295
+ },
296
+ "aux:ba":{
297
+ "p":0.9265536723,
298
+ "r":0.8723404255,
299
+ "f":0.898630137
300
+ },
301
+ "advmod:loc":{
302
+ "p":0.80859375,
303
+ "r":0.6142433234,
304
+ "f":0.6981450253
305
+ },
306
+ "aux:asp":{
307
+ "p":0.9320882852,
308
+ "r":0.8755980861,
309
+ "f":0.9029605263
310
+ },
311
+ "conj":{
312
+ "p":0.6313854489,
313
+ "r":0.6168241966,
314
+ "f":0.6240198891
315
+ },
316
+ "nsubjpass":{
317
+ "p":0.8913043478,
318
+ "r":0.82,
319
+ "f":0.8541666667
320
+ },
321
+ "compound:vc":{
322
+ "p":0.5459183673,
323
+ "r":0.5544041451,
324
+ "f":0.5501285347
325
+ },
326
+ "advcl:loc":{
327
+ "p":0.7586206897,
328
+ "r":0.6285714286,
329
+ "f":0.6875
330
+ },
331
+ "cc":{
332
+ "p":0.7972477064,
333
+ "r":0.7710736469,
334
+ "f":0.7839422643
335
+ },
336
+ "advmod:dvp":{
337
+ "p":0.9076923077,
338
+ "r":0.7329192547,
339
+ "f":0.8109965636
340
+ },
341
+ "amod:ordmod":{
342
+ "p":0.6666666667,
343
+ "r":0.59375,
344
+ "f":0.6280991736
345
+ },
346
+ "appos":{
347
+ "p":0.9434889435,
348
+ "r":0.8827586207,
349
+ "f":0.9121140143
350
+ },
351
+ "nmod:poss":{
352
+ "p":0.7647058824,
353
+ "r":0.6740740741,
354
+ "f":0.7165354331
355
+ },
356
+ "name":{
357
+ "p":0.6097560976,
358
+ "r":0.5555555556,
359
+ "f":0.5813953488
360
+ },
361
+ "nsubj:xsubj":{
362
+ "p":0.5,
363
+ "r":0.1,
364
+ "f":0.1666666667
365
+ },
366
+ "nmod:range":{
367
+ "p":0.8295454545,
368
+ "r":0.7348993289,
369
+ "f":0.7793594306
370
+ },
371
+ "parataxis:prnmod":{
372
+ "p":0.3663366337,
373
+ "r":0.2781954887,
374
+ "f":0.3162393162
375
+ },
376
+ "erased":{
377
+ "p":0.0,
378
+ "r":0.0,
379
+ "f":0.0
380
+ },
381
+ "etc":{
382
+ "p":0.9285714286,
383
+ "r":0.9285714286,
384
+ "f":0.9285714286
385
+ }
386
+ },
387
+ "ents_per_type":{
388
+ "DATE":{
389
+ "p":0.6931530008,
390
+ "r":0.8126858276,
391
+ "f":0.7481751825
392
+ },
393
+ "GPE":{
394
+ "p":0.792633015,
395
+ "r":0.8519061584,
396
+ "f":0.8212014134
397
+ },
398
+ "CARDINAL":{
399
+ "p":0.5527502254,
400
+ "r":0.6179435484,
401
+ "f":0.5835316516
402
+ },
403
+ "ORDINAL":{
404
+ "p":0.8287292818,
405
+ "r":0.7894736842,
406
+ "f":0.8086253369
407
+ },
408
+ "FAC":{
409
+ "p":0.5301204819,
410
+ "r":0.4731182796,
411
+ "f":0.5
412
+ },
413
+ "ORG":{
414
+ "p":0.7304479879,
415
+ "r":0.7321156773,
416
+ "f":0.7312808818
417
+ },
418
+ "LOC":{
419
+ "p":0.1899383984,
420
+ "r":0.497311828,
421
+ "f":0.2748885587
422
+ },
423
+ "NORP":{
424
+ "p":0.6797900262,
425
+ "r":0.5441176471,
426
+ "f":0.6044340723
427
+ },
428
+ "QUANTITY":{
429
+ "p":0.7363636364,
430
+ "r":0.6,
431
+ "f":0.6612244898
432
+ },
433
+ "PERSON":{
434
+ "p":0.8649842271,
435
+ "r":0.8833762887,
436
+ "f":0.8740835193
437
+ },
438
+ "TIME":{
439
+ "p":0.711627907,
440
+ "r":0.7427184466,
441
+ "f":0.7268408551
442
+ },
443
+ "WORK_OF_ART":{
444
+ "p":0.1849315068,
445
+ "r":0.18,
446
+ "f":0.1824324324
447
+ },
448
+ "MONEY":{
449
+ "p":0.8682170543,
450
+ "r":0.8296296296,
451
+ "f":0.8484848485
452
+ },
453
+ "EVENT":{
454
+ "p":0.5804195804,
455
+ "r":0.6102941176,
456
+ "f":0.5949820789
457
+ },
458
+ "PERCENT":{
459
+ "p":0.7640449438,
460
+ "r":0.8192771084,
461
+ "f":0.7906976744
462
+ },
463
+ "PRODUCT":{
464
+ "p":0.5384615385,
465
+ "r":0.1428571429,
466
+ "f":0.2258064516
467
+ },
468
+ "LAW":{
469
+ "p":0.3076923077,
470
+ "r":0.2666666667,
471
+ "f":0.2857142857
472
+ },
473
+ "LANGUAGE":{
474
+ "p":0.8181818182,
475
+ "r":1.0,
476
+ "f":0.9
477
+ }
478
+ }
479
+ },
480
+ "sources":[
481
+ {
482
+ "name":"OntoNotes 5",
483
+ "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
484
+ "license":"commercial (licensed by Explosion)",
485
+ "author":"Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston"
486
+ },
487
+ {
488
+ "name":"CoreNLP Universal Dependencies Converter",
489
+ "url":"https://nlp.stanford.edu/software/stanford-dependencies.html",
490
+ "author":"Stanford NLP Group",
491
+ "license":"Citation provided for reference, no code packaged with model"
492
+ },
493
+ {
494
+ "name":"bert-base-chinese",
495
+ "author":"Hugging Face",
496
+ "url":"https://huggingface.co/bert-base-chinese",
497
+ "license":""
498
+ }
499
+ ],
500
+ "requirements":[
501
+ "spacy-transformers>=1.0.3,<1.1.0",
502
+ "spacy-pkuseg>=0.0.27,<0.1.0"
503
+ ]
504
+ }
ner/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":1,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
ner/model ADDED
Binary file (314 kB). View file
ner/moves ADDED
@@ -0,0 +1 @@
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{},"1":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"2":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"3":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"4":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336,"":1},"5":{"":1}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
parser/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":30,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
parser/model ADDED
Binary file (460 kB). View file
parser/moves ADDED
@@ -0,0 +1 @@
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{"":406724},"1":{"":267280},"2":{"advmod":56948,"nsubj":53528,"compound:nn":43922,"dep":40127,"punct":36036,"case":23985,"nmod:assmod":21598,"nmod:prep":20097,"amod":16922,"acl":11976,"conj":10687,"cop":7237,"det":7209,"nummod":6994,"cc":6236,"aux:modal":5566,"nmod:tmod":5335,"nmod":4914,"neg":4364,"xcomp":3881,"appos":2954,"nmod:topic":2411,"discourse":2163,"advmod:loc":1590,"aux:prtmod":1539,"aux:ba":1311,"auxpass":1220,"advmod:dvp":1142,"advcl:loc":1046,"name":1032,"compound:vc":830,"nmod:poss":560,"amod:ordmod":511,"dobj":406,"nsubjpass":263,"nsubj:xsubj||ccomp":62,"parataxis:prnmod":34,"nsubj:xsubj":32},"3":{"punct":74030,"dobj":45389,"conj":30046,"case":30027,"dep":18664,"ccomp":17217,"mark":16601,"mark:clf":11552,"aux:asp":7896,"discourse":3998,"advmod:rcomp":2388,"nmod:range":1885,"cc":1675,"nmod:prep":1595,"advmod":1117,"etc":941,"compound:vc":790,"parataxis:prnmod":694,"advmod:loc":522,"neg":69,"advcl:loc":39,"acl":39},"4":{"ROOT":34547}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
tagger/cfg ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "AD",
4
+ "AS",
5
+ "BA",
6
+ "CC",
7
+ "CD",
8
+ "CS",
9
+ "DEC",
10
+ "DEG",
11
+ "DER",
12
+ "DEV",
13
+ "DT",
14
+ "ETC",
15
+ "FW",
16
+ "IJ",
17
+ "INF",
18
+ "JJ",
19
+ "LB",
20
+ "LC",
21
+ "M",
22
+ "MSP",
23
+ "NN",
24
+ "NR",
25
+ "NT",
26
+ "OD",
27
+ "ON",
28
+ "P",
29
+ "PN",
30
+ "PU",
31
+ "SB",
32
+ "SP",
33
+ "URL",
34
+ "VA",
35
+ "VC",
36
+ "VE",
37
+ "VV",
38
+ "X"
39
+ ]
40
+ }
tagger/model ADDED
Binary file (111 kB). View file
tokenizer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ {
2
+ "segmenter":"pkuseg"
3
+ }
tokenizer/pkuseg_model/features.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd4322482a7018b9bce9216173ae9d2848efe6d310b468bbb4383fb55c874a18
3
+ size 22685181
tokenizer/pkuseg_model/weights.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ada075eb25a854f71d6e6fa4e7d55e7be0ae049255b1f8f19d05c13b1b68c9e
3
+ size 37508754
tokenizer/pkuseg_processors ADDED
Binary file (4.53 MB). View file
transformer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ {
2
+ "max_batch_items":4096
3
+ }
transformer/model/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/mnt/scratch/tmp/zh_core_web_trf/d999ed1d-d7a0-4c09-b6e1-c8df4f70f55c/training/core/model-best/transformer/model",
3
+ "architectures": [
4
+ "BertForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "directionality": "bidi",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "pooler_fc_size": 768,
21
+ "pooler_num_attention_heads": 12,
22
+ "pooler_num_fc_layers": 3,
23
+ "pooler_size_per_head": 128,
24
+ "pooler_type": "first_token_transform",
25
+ "position_embedding_type": "absolute",
26
+ "transformers_version": "4.6.1",
27
+ "type_vocab_size": 2,
28
+ "use_cache": true,
29
+ "vocab_size": 21128
30
+ }
transformer/model/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88b011d7e6facc2f530831168b8e50391dc0c6fa262867559a70bc73df8c3294
3
+ size 409149169
transformer/model/special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
transformer/model/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
transformer/model/tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"do_lower_case": false, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "/mnt/scratch/tmp/zh_core_web_trf/d999ed1d-d7a0-4c09-b6e1-c8df4f70f55c/training/core/model-best/transformer/model"}
transformer/model/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
vocab/key2row ADDED
@@ -0,0 +1 @@
 
1
+ οΏ½
vocab/lookups.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e472d889257fe39453852761153a717a0aab7a968b8e1dbc1229acd1277cf83
3
+ size 1216265
vocab/vectors ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14772b683e726436d5948ad3fff2b43d036ef2ebbe3458aafed6004e05a40706
3
+ size 128
zh_core_web_trf-any-py3-none-any.whl ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e54306e18ab65a96dcd13dced77de99741ef9a8475f52d81c5f28fdea5c147eb
3
+ size 417437795