osanseviero commited on
Commit
0afbbcc
β€’
1 Parent(s): af75908

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -14,3 +14,7 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ *.whl filter=lfs diff=lfs merge=lfs -text
18
+ *.npz filter=lfs diff=lfs merge=lfs -text
19
+ *strings.json filter=lfs diff=lfs merge=lfs -text
20
+ vectors filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2021 ExplosionAI GmbH
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the "Software"), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7
+ of the Software, and to permit persons to whom the Software is furnished to do
8
+ so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
LICENSES_SOURCES ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OntoNotes 5
2
+
3
+ * Author: Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston
4
+ * URL: https://catalog.ldc.upenn.edu/LDC2013T19
5
+ * License: commercial (licensed by Explosion)
6
+
7
+ ```
8
+ ```
9
+
10
+
11
+
12
+
13
+ # CoreNLP Universal Dependencies Converter
14
+
15
+ * Author: Stanford NLP Group
16
+ * URL: https://nlp.stanford.edu/software/stanford-dependencies.html
17
+ * License: Citation provided for reference, no code packaged with model
18
+
19
+ ```
20
+ ```
21
+
22
+
23
+
24
+
25
+ # Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)
26
+
27
+ * Author: Explosion
28
+ * URL: https://spacy.io
29
+ * License: CC0
30
+
31
+ ```
32
+ The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
33
+
34
+ Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others.
35
+
36
+ For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights.
37
+
38
+ 1. Copyright and Related Rights. A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following:
39
+
40
+ the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work;
41
+ moral rights retained by the original author(s) and/or performer(s);
42
+ publicity and privacy rights pertaining to a person's image or likeness depicted in a Work;
43
+ rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below;
44
+ rights protecting the extraction, dissemination, use and reuse of data in a Work;
45
+ database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and
46
+ other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof.
47
+ 2. Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose.
48
+
49
+ 3. Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose.
50
+
51
+ 4. Limitations and Disclaimers.
52
+
53
+ No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document.
54
+ Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law.
55
+ Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work.
56
+ Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work.```
57
+
58
+
59
+
60
+
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - spacy
4
+ - token-classification
5
+ language:
6
+ - zh
7
+ license: MIT
8
+ model-index:
9
+ - name: zh_core_web_md
10
+ results:
11
+ - tasks:
12
+ name: NER
13
+ type: token-classification
14
+ metrics:
15
+ - name: Precision
16
+ type: precision
17
+ value: 0.7220589964
18
+ - name: Recall
19
+ type: recall
20
+ value: 0.6751648352
21
+ - name: F Score
22
+ type: f_score
23
+ value: 0.6978249759
24
+ - tasks:
25
+ name: POS
26
+ type: token-classification
27
+ metrics:
28
+ - name: Accuracy
29
+ type: accuracy
30
+ value: 0.9004973002
31
+ - tasks:
32
+ name: SENTER
33
+ type: token-classification
34
+ metrics:
35
+ - name: Precision
36
+ type: precision
37
+ value: 0.7859447831
38
+ - name: Recall
39
+ type: recall
40
+ value: 0.7298152156
41
+ - name: F Score
42
+ type: f_score
43
+ value: 0.7568407423
44
+ - tasks:
45
+ name: UNLABELED_DEPENDENCIES
46
+ type: token-classification
47
+ metrics:
48
+ - name: Accuracy
49
+ type: accuracy
50
+ value: 0.7076909586
51
+ - tasks:
52
+ name: LABELED_DEPENDENCIES
53
+ type: token-classification
54
+ metrics:
55
+ - name: Accuracy
56
+ type: accuracy
57
+ value: 0.7076909586
58
+ ---
59
+ ### Details: https://spacy.io/models/zh#zh_core_web_md
60
+
61
+ Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.
62
+
63
+ | Feature | Description |
64
+ | --- | --- |
65
+ | **Name** | `zh_core_web_md` |
66
+ | **Version** | `3.1.0` |
67
+ | **spaCy** | `>=3.1.0,<3.2.0` |
68
+ | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `ner` |
69
+ | **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
+ | **Vectors** | 500000 keys, 20000 unique vectors (300 dimensions) |
71
+ | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br />[CoreNLP Universal Dependencies Converter](https://nlp.stanford.edu/software/stanford-dependencies.html) (Stanford NLP Group)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
72
+ | **License** | `MIT` |
73
+ | **Author** | [Explosion](https://explosion.ai) |
74
+
75
+ ### Label Scheme
76
+
77
+ <details>
78
+
79
+ <summary>View label scheme (101 labels for 4 components)</summary>
80
+
81
+ | Component | Labels |
82
+ | --- | --- |
83
+ | **`tagger`** | `AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X` |
84
+ | **`parser`** | `ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp` |
85
+ | **`senter`** | `I`, `S` |
86
+ | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
87
+
88
+ </details>
89
+
90
+ ### Accuracy
91
+
92
+ | Type | Score |
93
+ | --- | --- |
94
+ | `TOKEN_ACC` | 97.88 |
95
+ | `TAG_ACC` | 90.05 |
96
+ | `DEP_UAS` | 70.77 |
97
+ | `DEP_LAS` | 65.52 |
98
+ | `ENTS_P` | 72.21 |
99
+ | `ENTS_R` | 67.52 |
100
+ | `ENTS_F` | 69.78 |
101
+ | `SENTS_P` | 78.59 |
102
+ | `SENTS_R` | 72.98 |
103
+ | `SENTS_F` | 75.68 |
accuracy.json ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "token_acc": 0.9788303388,
3
+ "tag_acc": 0.9004973002,
4
+ "dep_uas": 0.7076909586,
5
+ "dep_las": 0.6551856356,
6
+ "ents_p": 0.7220589964,
7
+ "ents_r": 0.6751648352,
8
+ "ents_f": 0.6978249759,
9
+ "sents_p": 0.7859447831,
10
+ "sents_r": 0.7298152156,
11
+ "sents_f": 0.7568407423,
12
+ "speed": 10063.789558808,
13
+ "dep_las_per_type": {
14
+ "dep": {
15
+ "p": 0.4941927991,
16
+ "r": 0.3439426089,
17
+ "f": 0.4056002383
18
+ },
19
+ "case": {
20
+ "p": 0.815348957,
21
+ "r": 0.7677012609,
22
+ "f": 0.790808043
23
+ },
24
+ "nmod:tmod": {
25
+ "p": 0.7291941876,
26
+ "r": 0.7510204082,
27
+ "f": 0.7399463807
28
+ },
29
+ "nummod": {
30
+ "p": 0.8324715615,
31
+ "r": 0.5363091272,
32
+ "f": 0.652350081
33
+ },
34
+ "mark:clf": {
35
+ "p": 0.923958962,
36
+ "r": 0.5710555763,
37
+ "f": 0.7058552328
38
+ },
39
+ "auxpass": {
40
+ "p": 0.8864864865,
41
+ "r": 0.8864864865,
42
+ "f": 0.8864864865
43
+ },
44
+ "nsubj": {
45
+ "p": 0.7838943894,
46
+ "r": 0.7293944233,
47
+ "f": 0.7556630186
48
+ },
49
+ "acl": {
50
+ "p": 0.7085714286,
51
+ "r": 0.5501941209,
52
+ "f": 0.6194192944
53
+ },
54
+ "advmod": {
55
+ "p": 0.8221938776,
56
+ "r": 0.7306733167,
57
+ "f": 0.7737366463
58
+ },
59
+ "mark": {
60
+ "p": 0.7447306792,
61
+ "r": 0.6967572305,
62
+ "f": 0.7199456645
63
+ },
64
+ "xcomp": {
65
+ "p": 0.7822878229,
66
+ "r": 0.6905537459,
67
+ "f": 0.7335640138
68
+ },
69
+ "nmod:assmod": {
70
+ "p": 0.7571008815,
71
+ "r": 0.7217553688,
72
+ "f": 0.7390057361
73
+ },
74
+ "det": {
75
+ "p": 0.8367670365,
76
+ "r": 0.618629174,
77
+ "f": 0.7113506231
78
+ },
79
+ "amod": {
80
+ "p": 0.7567811935,
81
+ "r": 0.6575019639,
82
+ "f": 0.7036569987
83
+ },
84
+ "nmod:prep": {
85
+ "p": 0.6989096025,
86
+ "r": 0.6010284332,
87
+ "f": 0.6462839486
88
+ },
89
+ "root": {
90
+ "p": 0.7426623746,
91
+ "r": 0.6529049442,
92
+ "f": 0.694897236
93
+ },
94
+ "aux:prtmod": {
95
+ "p": 0.9058823529,
96
+ "r": 0.825,
97
+ "f": 0.8635514019
98
+ },
99
+ "compound:nn": {
100
+ "p": 0.7339595888,
101
+ "r": 0.700676819,
102
+ "f": 0.716932133
103
+ },
104
+ "dobj": {
105
+ "p": 0.802248996,
106
+ "r": 0.7397422604,
107
+ "f": 0.76972873
108
+ },
109
+ "ccomp": {
110
+ "p": 0.6483430799,
111
+ "r": 0.6465785381,
112
+ "f": 0.6474596068
113
+ },
114
+ "advmod:rcomp": {
115
+ "p": 0.8196202532,
116
+ "r": 0.7174515235,
117
+ "f": 0.765140325
118
+ },
119
+ "nmod:topic": {
120
+ "p": 0.3596059113,
121
+ "r": 0.237012987,
122
+ "f": 0.2857142857
123
+ },
124
+ "cop": {
125
+ "p": 0.7555739059,
126
+ "r": 0.5888030888,
127
+ "f": 0.6618444846
128
+ },
129
+ "discourse": {
130
+ "p": 0.5577797998,
131
+ "r": 0.5057755776,
132
+ "f": 0.5305062743
133
+ },
134
+ "neg": {
135
+ "p": 0.8365527489,
136
+ "r": 0.6694411415,
137
+ "f": 0.7437252312
138
+ },
139
+ "aux:modal": {
140
+ "p": 0.8626198083,
141
+ "r": 0.8376421923,
142
+ "f": 0.8499475341
143
+ },
144
+ "nmod": {
145
+ "p": 0.7152,
146
+ "r": 0.6065128901,
147
+ "f": 0.6563876652
148
+ },
149
+ "aux:ba": {
150
+ "p": 0.8444444444,
151
+ "r": 0.8085106383,
152
+ "f": 0.8260869565
153
+ },
154
+ "advmod:loc": {
155
+ "p": 0.6130268199,
156
+ "r": 0.4747774481,
157
+ "f": 0.5351170569
158
+ },
159
+ "aux:asp": {
160
+ "p": 0.9095435685,
161
+ "r": 0.8740031898,
162
+ "f": 0.8914192761
163
+ },
164
+ "conj": {
165
+ "p": 0.5032329577,
166
+ "r": 0.5149338374,
167
+ "f": 0.5090161637
168
+ },
169
+ "nsubjpass": {
170
+ "p": 0.8292682927,
171
+ "r": 0.68,
172
+ "f": 0.7472527473
173
+ },
174
+ "compound:vc": {
175
+ "p": 0.4486486486,
176
+ "r": 0.4300518135,
177
+ "f": 0.4391534392
178
+ },
179
+ "advcl:loc": {
180
+ "p": 0.5945945946,
181
+ "r": 0.4714285714,
182
+ "f": 0.5258964143
183
+ },
184
+ "cc": {
185
+ "p": 0.7013108614,
186
+ "r": 0.6645962733,
187
+ "f": 0.6824601367
188
+ },
189
+ "advmod:dvp": {
190
+ "p": 0.8045112782,
191
+ "r": 0.6645962733,
192
+ "f": 0.7278911565
193
+ },
194
+ "appos": {
195
+ "p": 0.8658536585,
196
+ "r": 0.816091954,
197
+ "f": 0.8402366864
198
+ },
199
+ "name": {
200
+ "p": 0.5625,
201
+ "r": 0.4666666667,
202
+ "f": 0.5101214575
203
+ },
204
+ "parataxis:prnmod": {
205
+ "p": 0.5,
206
+ "r": 0.1278195489,
207
+ "f": 0.2035928144
208
+ },
209
+ "nmod:poss": {
210
+ "p": 0.6352941176,
211
+ "r": 0.4,
212
+ "f": 0.4909090909
213
+ },
214
+ "nsubj:xsubj": {
215
+ "p": 0.0,
216
+ "r": 0.0,
217
+ "f": 0.0
218
+ },
219
+ "nmod:range": {
220
+ "p": 0.7346153846,
221
+ "r": 0.6409395973,
222
+ "f": 0.6845878136
223
+ },
224
+ "amod:ordmod": {
225
+ "p": 0.6181818182,
226
+ "r": 0.53125,
227
+ "f": 0.5714285714
228
+ },
229
+ "erased": {
230
+ "p": 0.0,
231
+ "r": 0.0,
232
+ "f": 0.0
233
+ },
234
+ "etc": {
235
+ "p": 0.9268292683,
236
+ "r": 0.9047619048,
237
+ "f": 0.9156626506
238
+ }
239
+ },
240
+ "ents_per_type": {
241
+ "DATE": {
242
+ "p": 0.758780037,
243
+ "r": 0.8136769078,
244
+ "f": 0.7852702056
245
+ },
246
+ "GPE": {
247
+ "p": 0.7517889088,
248
+ "r": 0.8216031281,
249
+ "f": 0.7851471275
250
+ },
251
+ "ORDINAL": {
252
+ "p": 0.8720930233,
253
+ "r": 0.7894736842,
254
+ "f": 0.8287292818
255
+ },
256
+ "FAC": {
257
+ "p": 0.5076923077,
258
+ "r": 0.3548387097,
259
+ "f": 0.417721519
260
+ },
261
+ "PERSON": {
262
+ "p": 0.7917511832,
263
+ "r": 0.7545103093,
264
+ "f": 0.7726822831
265
+ },
266
+ "ORG": {
267
+ "p": 0.6896831844,
268
+ "r": 0.6461187215,
269
+ "f": 0.6671905697
270
+ },
271
+ "QUANTITY": {
272
+ "p": 0.7706422018,
273
+ "r": 0.6222222222,
274
+ "f": 0.6885245902
275
+ },
276
+ "CARDINAL": {
277
+ "p": 0.6181818182,
278
+ "r": 0.5141129032,
279
+ "f": 0.5613648872
280
+ },
281
+ "LOC": {
282
+ "p": 0.5247148289,
283
+ "r": 0.3709677419,
284
+ "f": 0.4346456693
285
+ },
286
+ "TIME": {
287
+ "p": 0.7209302326,
288
+ "r": 0.7524271845,
289
+ "f": 0.7363420428
290
+ },
291
+ "NORP": {
292
+ "p": 0.6646153846,
293
+ "r": 0.4537815126,
294
+ "f": 0.5393258427
295
+ },
296
+ "WORK_OF_ART": {
297
+ "p": 0.5733333333,
298
+ "r": 0.2866666667,
299
+ "f": 0.3822222222
300
+ },
301
+ "PRODUCT": {
302
+ "p": 0.2,
303
+ "r": 0.0612244898,
304
+ "f": 0.09375
305
+ },
306
+ "MONEY": {
307
+ "p": 0.9230769231,
308
+ "r": 0.8,
309
+ "f": 0.8571428571
310
+ },
311
+ "PERCENT": {
312
+ "p": 0.7613636364,
313
+ "r": 0.8072289157,
314
+ "f": 0.783625731
315
+ },
316
+ "EVENT": {
317
+ "p": 0.5688073394,
318
+ "r": 0.4558823529,
319
+ "f": 0.506122449
320
+ },
321
+ "LAW": {
322
+ "p": 0.4814814815,
323
+ "r": 0.2166666667,
324
+ "f": 0.2988505747
325
+ },
326
+ "LANGUAGE": {
327
+ "p": 0.6363636364,
328
+ "r": 0.7777777778,
329
+ "f": 0.7
330
+ }
331
+ }
332
+ }
attribute_ruler/patterns ADDED
Binary file (1.93 kB). View file
 
config.cfg ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [paths]
2
+ train = "corpus/zh-core-news/train.spacy"
3
+ dev = "corpus/zh-core-news/dev.spacy"
4
+ vectors = "corpus/zh_vectors"
5
+ raw = null
6
+ init_tok2vec = null
7
+ vocab_data = null
8
+
9
+ [system]
10
+ gpu_allocator = null
11
+ seed = 0
12
+
13
+ [nlp]
14
+ lang = "zh"
15
+ pipeline = ["tok2vec","tagger","parser","senter","attribute_ruler","ner"]
16
+ disabled = ["senter"]
17
+ before_creation = null
18
+ after_creation = null
19
+ after_pipeline_creation = null
20
+ batch_size = 256
21
+
22
+ [nlp.tokenizer]
23
+ @tokenizers = "spacy.zh.ChineseTokenizer"
24
+ segmenter = "pkuseg"
25
+
26
+ [components]
27
+
28
+ [components.attribute_ruler]
29
+ factory = "attribute_ruler"
30
+ validate = false
31
+
32
+ [components.ner]
33
+ factory = "ner"
34
+ incorrect_spans_key = null
35
+ moves = null
36
+ update_with_oracle_cut_size = 100
37
+
38
+ [components.ner.model]
39
+ @architectures = "spacy.TransitionBasedParser.v2"
40
+ state_type = "ner"
41
+ extra_state_tokens = false
42
+ hidden_width = 64
43
+ maxout_pieces = 2
44
+ use_upper = true
45
+ nO = null
46
+
47
+ [components.ner.model.tok2vec]
48
+ @architectures = "spacy.Tok2Vec.v2"
49
+
50
+ [components.ner.model.tok2vec.embed]
51
+ @architectures = "spacy.MultiHashEmbed.v2"
52
+ width = 96
53
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
54
+ rows = [5000,2500,2500,2500]
55
+ include_static_vectors = true
56
+
57
+ [components.ner.model.tok2vec.encode]
58
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
59
+ width = 96
60
+ depth = 4
61
+ window_size = 1
62
+ maxout_pieces = 3
63
+
64
+ [components.parser]
65
+ factory = "parser"
66
+ learn_tokens = false
67
+ min_action_freq = 30
68
+ moves = null
69
+ update_with_oracle_cut_size = 100
70
+
71
+ [components.parser.model]
72
+ @architectures = "spacy.TransitionBasedParser.v2"
73
+ state_type = "parser"
74
+ extra_state_tokens = false
75
+ hidden_width = 64
76
+ maxout_pieces = 2
77
+ use_upper = true
78
+ nO = null
79
+
80
+ [components.parser.model.tok2vec]
81
+ @architectures = "spacy.Tok2VecListener.v1"
82
+ width = ${components.tok2vec.model.encode:width}
83
+ upstream = "tok2vec"
84
+
85
+ [components.senter]
86
+ factory = "senter"
87
+
88
+ [components.senter.model]
89
+ @architectures = "spacy.Tagger.v1"
90
+ nO = null
91
+
92
+ [components.senter.model.tok2vec]
93
+ @architectures = "spacy.Tok2Vec.v2"
94
+
95
+ [components.senter.model.tok2vec.embed]
96
+ @architectures = "spacy.MultiHashEmbed.v2"
97
+ width = 16
98
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
99
+ rows = [1000,500,500,500]
100
+ include_static_vectors = true
101
+
102
+ [components.senter.model.tok2vec.encode]
103
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
104
+ width = 16
105
+ depth = 2
106
+ window_size = 1
107
+ maxout_pieces = 2
108
+
109
+ [components.tagger]
110
+ factory = "tagger"
111
+
112
+ [components.tagger.model]
113
+ @architectures = "spacy.Tagger.v1"
114
+ nO = null
115
+
116
+ [components.tagger.model.tok2vec]
117
+ @architectures = "spacy.Tok2VecListener.v1"
118
+ width = ${components.tok2vec.model.encode:width}
119
+ upstream = "tok2vec"
120
+
121
+ [components.tok2vec]
122
+ factory = "tok2vec"
123
+
124
+ [components.tok2vec.model]
125
+ @architectures = "spacy.Tok2Vec.v2"
126
+
127
+ [components.tok2vec.model.embed]
128
+ @architectures = "spacy.MultiHashEmbed.v2"
129
+ width = ${components.tok2vec.model.encode:width}
130
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
131
+ rows = [5000,2500,2500,2500]
132
+ include_static_vectors = true
133
+
134
+ [components.tok2vec.model.encode]
135
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
136
+ width = 96
137
+ depth = 4
138
+ window_size = 1
139
+ maxout_pieces = 3
140
+
141
+ [corpora]
142
+
143
+ [corpora.dev]
144
+ @readers = "spacy.Corpus.v1"
145
+ limit = 0
146
+ max_length = 0
147
+ path = ${paths:dev}
148
+ gold_preproc = false
149
+ augmenter = null
150
+
151
+ [corpora.train]
152
+ @readers = "spacy.Corpus.v1"
153
+ path = ${paths:train}
154
+ max_length = 5000
155
+ gold_preproc = false
156
+ limit = 0
157
+ augmenter = null
158
+
159
+ [training]
160
+ train_corpus = "corpora.train"
161
+ dev_corpus = "corpora.dev"
162
+ seed = ${system:seed}
163
+ gpu_allocator = ${system:gpu_allocator}
164
+ dropout = 0.1
165
+ accumulate_gradient = 1
166
+ patience = 5000
167
+ max_epochs = 0
168
+ max_steps = 0
169
+ eval_frequency = 1000
170
+ frozen_components = []
171
+ before_to_disk = null
172
+ annotating_components = []
173
+
174
+ [training.batcher]
175
+ @batchers = "spacy.batch_by_words.v1"
176
+ discard_oversize = false
177
+ tolerance = 0.2
178
+ get_length = null
179
+
180
+ [training.batcher.size]
181
+ @schedules = "compounding.v1"
182
+ start = 100
183
+ stop = 1000
184
+ compound = 1.001
185
+ t = 0.0
186
+
187
+ [training.logger]
188
+ @loggers = "spacy.WandbLogger.v1"
189
+ project_name = "spacy-v3.0.0a2"
190
+ remove_config_values = []
191
+
192
+ [training.optimizer]
193
+ @optimizers = "Adam.v1"
194
+ beta1 = 0.9
195
+ beta2 = 0.999
196
+ L2_is_weight_decay = true
197
+ L2 = 0.01
198
+ grad_clip = 1.0
199
+ use_averages = true
200
+ eps = 0.00000001
201
+ learn_rate = 0.001
202
+
203
+ [training.score_weights]
204
+ tag_acc = 0.24
205
+ dep_uas = 0.0
206
+ dep_las = 0.24
207
+ dep_las_per_type = null
208
+ sents_p = null
209
+ sents_r = null
210
+ sents_f = 0.03
211
+ ents_f = 0.5
212
+ ents_p = 0.0
213
+ ents_r = 0.0
214
+ ents_per_type = null
215
+
216
+ [pretraining]
217
+
218
+ [initialize]
219
+ vocab_data = ${paths.vocab_data}
220
+ vectors = ${paths.vectors}
221
+ init_tok2vec = ${paths.init_tok2vec}
222
+ before_init = null
223
+ after_init = null
224
+
225
+ [initialize.components]
226
+
227
+ [initialize.components.ner]
228
+
229
+ [initialize.components.ner.labels]
230
+ @readers = "spacy.read_labels.v1"
231
+ path = "corpus/labels/ner.json"
232
+ require = false
233
+
234
+ [initialize.components.parser]
235
+
236
+ [initialize.components.parser.labels]
237
+ @readers = "spacy.read_labels.v1"
238
+ path = "corpus/labels/parser.json"
239
+ require = false
240
+
241
+ [initialize.components.tagger]
242
+
243
+ [initialize.components.tagger.labels]
244
+ @readers = "spacy.read_labels.v1"
245
+ path = "corpus/labels/tagger.json"
246
+ require = false
247
+
248
+ [initialize.lookups]
249
+ @misc = "spacy.LookupsDataLoader.v1"
250
+ lang = ${nlp.lang}
251
+ tables = []
252
+
253
+ [initialize.tokenizer]
254
+ pkuseg_model = "assets/pkuseg_model"
255
+ pkuseg_user_dict = "default"
meta.json ADDED
@@ -0,0 +1,508 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lang":"zh",
3
+ "name":"core_web_md",
4
+ "version":"3.1.0",
5
+ "description":"Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.",
6
+ "author":"Explosion",
7
+ "email":"contact@explosion.ai",
8
+ "url":"https://explosion.ai",
9
+ "license":"MIT",
10
+ "spacy_version":">=3.1.0,<3.2.0",
11
+ "spacy_git_version":"caba63b74",
12
+ "vectors":{
13
+ "width":300,
14
+ "vectors":20000,
15
+ "keys":500000,
16
+ "name":"zh_vectors"
17
+ },
18
+ "labels":{
19
+ "tok2vec":[
20
+
21
+ ],
22
+ "tagger":[
23
+ "AD",
24
+ "AS",
25
+ "BA",
26
+ "CC",
27
+ "CD",
28
+ "CS",
29
+ "DEC",
30
+ "DEG",
31
+ "DER",
32
+ "DEV",
33
+ "DT",
34
+ "ETC",
35
+ "FW",
36
+ "IJ",
37
+ "INF",
38
+ "JJ",
39
+ "LB",
40
+ "LC",
41
+ "M",
42
+ "MSP",
43
+ "NN",
44
+ "NR",
45
+ "NT",
46
+ "OD",
47
+ "ON",
48
+ "P",
49
+ "PN",
50
+ "PU",
51
+ "SB",
52
+ "SP",
53
+ "URL",
54
+ "VA",
55
+ "VC",
56
+ "VE",
57
+ "VV",
58
+ "X"
59
+ ],
60
+ "parser":[
61
+ "ROOT",
62
+ "acl",
63
+ "advcl:loc",
64
+ "advmod",
65
+ "advmod:dvp",
66
+ "advmod:loc",
67
+ "advmod:rcomp",
68
+ "amod",
69
+ "amod:ordmod",
70
+ "appos",
71
+ "aux:asp",
72
+ "aux:ba",
73
+ "aux:modal",
74
+ "aux:prtmod",
75
+ "auxpass",
76
+ "case",
77
+ "cc",
78
+ "ccomp",
79
+ "compound:nn",
80
+ "compound:vc",
81
+ "conj",
82
+ "cop",
83
+ "dep",
84
+ "det",
85
+ "discourse",
86
+ "dobj",
87
+ "etc",
88
+ "mark",
89
+ "mark:clf",
90
+ "name",
91
+ "neg",
92
+ "nmod",
93
+ "nmod:assmod",
94
+ "nmod:poss",
95
+ "nmod:prep",
96
+ "nmod:range",
97
+ "nmod:tmod",
98
+ "nmod:topic",
99
+ "nsubj",
100
+ "nsubj:xsubj",
101
+ "nsubjpass",
102
+ "nummod",
103
+ "parataxis:prnmod",
104
+ "punct",
105
+ "xcomp"
106
+ ],
107
+ "senter":[
108
+ "I",
109
+ "S"
110
+ ],
111
+ "attribute_ruler":[
112
+
113
+ ],
114
+ "ner":[
115
+ "CARDINAL",
116
+ "DATE",
117
+ "EVENT",
118
+ "FAC",
119
+ "GPE",
120
+ "LANGUAGE",
121
+ "LAW",
122
+ "LOC",
123
+ "MONEY",
124
+ "NORP",
125
+ "ORDINAL",
126
+ "ORG",
127
+ "PERCENT",
128
+ "PERSON",
129
+ "PRODUCT",
130
+ "QUANTITY",
131
+ "TIME",
132
+ "WORK_OF_ART"
133
+ ]
134
+ },
135
+ "pipeline":[
136
+ "tok2vec",
137
+ "tagger",
138
+ "parser",
139
+ "attribute_ruler",
140
+ "ner"
141
+ ],
142
+ "components":[
143
+ "tok2vec",
144
+ "tagger",
145
+ "parser",
146
+ "senter",
147
+ "attribute_ruler",
148
+ "ner"
149
+ ],
150
+ "disabled":[
151
+ "senter"
152
+ ],
153
+ "performance":{
154
+ "token_acc":0.9788303388,
155
+ "tag_acc":0.9004973002,
156
+ "dep_uas":0.7076909586,
157
+ "dep_las":0.6551856356,
158
+ "ents_p":0.7220589964,
159
+ "ents_r":0.6751648352,
160
+ "ents_f":0.6978249759,
161
+ "sents_p":0.7859447831,
162
+ "sents_r":0.7298152156,
163
+ "sents_f":0.7568407423,
164
+ "speed":10063.789558808,
165
+ "dep_las_per_type":{
166
+ "dep":{
167
+ "p":0.4941927991,
168
+ "r":0.3439426089,
169
+ "f":0.4056002383
170
+ },
171
+ "case":{
172
+ "p":0.815348957,
173
+ "r":0.7677012609,
174
+ "f":0.790808043
175
+ },
176
+ "nmod:tmod":{
177
+ "p":0.7291941876,
178
+ "r":0.7510204082,
179
+ "f":0.7399463807
180
+ },
181
+ "nummod":{
182
+ "p":0.8324715615,
183
+ "r":0.5363091272,
184
+ "f":0.652350081
185
+ },
186
+ "mark:clf":{
187
+ "p":0.923958962,
188
+ "r":0.5710555763,
189
+ "f":0.7058552328
190
+ },
191
+ "auxpass":{
192
+ "p":0.8864864865,
193
+ "r":0.8864864865,
194
+ "f":0.8864864865
195
+ },
196
+ "nsubj":{
197
+ "p":0.7838943894,
198
+ "r":0.7293944233,
199
+ "f":0.7556630186
200
+ },
201
+ "acl":{
202
+ "p":0.7085714286,
203
+ "r":0.5501941209,
204
+ "f":0.6194192944
205
+ },
206
+ "advmod":{
207
+ "p":0.8221938776,
208
+ "r":0.7306733167,
209
+ "f":0.7737366463
210
+ },
211
+ "mark":{
212
+ "p":0.7447306792,
213
+ "r":0.6967572305,
214
+ "f":0.7199456645
215
+ },
216
+ "xcomp":{
217
+ "p":0.7822878229,
218
+ "r":0.6905537459,
219
+ "f":0.7335640138
220
+ },
221
+ "nmod:assmod":{
222
+ "p":0.7571008815,
223
+ "r":0.7217553688,
224
+ "f":0.7390057361
225
+ },
226
+ "det":{
227
+ "p":0.8367670365,
228
+ "r":0.618629174,
229
+ "f":0.7113506231
230
+ },
231
+ "amod":{
232
+ "p":0.7567811935,
233
+ "r":0.6575019639,
234
+ "f":0.7036569987
235
+ },
236
+ "nmod:prep":{
237
+ "p":0.6989096025,
238
+ "r":0.6010284332,
239
+ "f":0.6462839486
240
+ },
241
+ "root":{
242
+ "p":0.7426623746,
243
+ "r":0.6529049442,
244
+ "f":0.694897236
245
+ },
246
+ "aux:prtmod":{
247
+ "p":0.9058823529,
248
+ "r":0.825,
249
+ "f":0.8635514019
250
+ },
251
+ "compound:nn":{
252
+ "p":0.7339595888,
253
+ "r":0.700676819,
254
+ "f":0.716932133
255
+ },
256
+ "dobj":{
257
+ "p":0.802248996,
258
+ "r":0.7397422604,
259
+ "f":0.76972873
260
+ },
261
+ "ccomp":{
262
+ "p":0.6483430799,
263
+ "r":0.6465785381,
264
+ "f":0.6474596068
265
+ },
266
+ "advmod:rcomp":{
267
+ "p":0.8196202532,
268
+ "r":0.7174515235,
269
+ "f":0.765140325
270
+ },
271
+ "nmod:topic":{
272
+ "p":0.3596059113,
273
+ "r":0.237012987,
274
+ "f":0.2857142857
275
+ },
276
+ "cop":{
277
+ "p":0.7555739059,
278
+ "r":0.5888030888,
279
+ "f":0.6618444846
280
+ },
281
+ "discourse":{
282
+ "p":0.5577797998,
283
+ "r":0.5057755776,
284
+ "f":0.5305062743
285
+ },
286
+ "neg":{
287
+ "p":0.8365527489,
288
+ "r":0.6694411415,
289
+ "f":0.7437252312
290
+ },
291
+ "aux:modal":{
292
+ "p":0.8626198083,
293
+ "r":0.8376421923,
294
+ "f":0.8499475341
295
+ },
296
+ "nmod":{
297
+ "p":0.7152,
298
+ "r":0.6065128901,
299
+ "f":0.6563876652
300
+ },
301
+ "aux:ba":{
302
+ "p":0.8444444444,
303
+ "r":0.8085106383,
304
+ "f":0.8260869565
305
+ },
306
+ "advmod:loc":{
307
+ "p":0.6130268199,
308
+ "r":0.4747774481,
309
+ "f":0.5351170569
310
+ },
311
+ "aux:asp":{
312
+ "p":0.9095435685,
313
+ "r":0.8740031898,
314
+ "f":0.8914192761
315
+ },
316
+ "conj":{
317
+ "p":0.5032329577,
318
+ "r":0.5149338374,
319
+ "f":0.5090161637
320
+ },
321
+ "nsubjpass":{
322
+ "p":0.8292682927,
323
+ "r":0.68,
324
+ "f":0.7472527473
325
+ },
326
+ "compound:vc":{
327
+ "p":0.4486486486,
328
+ "r":0.4300518135,
329
+ "f":0.4391534392
330
+ },
331
+ "advcl:loc":{
332
+ "p":0.5945945946,
333
+ "r":0.4714285714,
334
+ "f":0.5258964143
335
+ },
336
+ "cc":{
337
+ "p":0.7013108614,
338
+ "r":0.6645962733,
339
+ "f":0.6824601367
340
+ },
341
+ "advmod:dvp":{
342
+ "p":0.8045112782,
343
+ "r":0.6645962733,
344
+ "f":0.7278911565
345
+ },
346
+ "appos":{
347
+ "p":0.8658536585,
348
+ "r":0.816091954,
349
+ "f":0.8402366864
350
+ },
351
+ "name":{
352
+ "p":0.5625,
353
+ "r":0.4666666667,
354
+ "f":0.5101214575
355
+ },
356
+ "parataxis:prnmod":{
357
+ "p":0.5,
358
+ "r":0.1278195489,
359
+ "f":0.2035928144
360
+ },
361
+ "nmod:poss":{
362
+ "p":0.6352941176,
363
+ "r":0.4,
364
+ "f":0.4909090909
365
+ },
366
+ "nsubj:xsubj":{
367
+ "p":0.0,
368
+ "r":0.0,
369
+ "f":0.0
370
+ },
371
+ "nmod:range":{
372
+ "p":0.7346153846,
373
+ "r":0.6409395973,
374
+ "f":0.6845878136
375
+ },
376
+ "amod:ordmod":{
377
+ "p":0.6181818182,
378
+ "r":0.53125,
379
+ "f":0.5714285714
380
+ },
381
+ "erased":{
382
+ "p":0.0,
383
+ "r":0.0,
384
+ "f":0.0
385
+ },
386
+ "etc":{
387
+ "p":0.9268292683,
388
+ "r":0.9047619048,
389
+ "f":0.9156626506
390
+ }
391
+ },
392
+ "ents_per_type":{
393
+ "DATE":{
394
+ "p":0.758780037,
395
+ "r":0.8136769078,
396
+ "f":0.7852702056
397
+ },
398
+ "GPE":{
399
+ "p":0.7517889088,
400
+ "r":0.8216031281,
401
+ "f":0.7851471275
402
+ },
403
+ "ORDINAL":{
404
+ "p":0.8720930233,
405
+ "r":0.7894736842,
406
+ "f":0.8287292818
407
+ },
408
+ "FAC":{
409
+ "p":0.5076923077,
410
+ "r":0.3548387097,
411
+ "f":0.417721519
412
+ },
413
+ "PERSON":{
414
+ "p":0.7917511832,
415
+ "r":0.7545103093,
416
+ "f":0.7726822831
417
+ },
418
+ "ORG":{
419
+ "p":0.6896831844,
420
+ "r":0.6461187215,
421
+ "f":0.6671905697
422
+ },
423
+ "QUANTITY":{
424
+ "p":0.7706422018,
425
+ "r":0.6222222222,
426
+ "f":0.6885245902
427
+ },
428
+ "CARDINAL":{
429
+ "p":0.6181818182,
430
+ "r":0.5141129032,
431
+ "f":0.5613648872
432
+ },
433
+ "LOC":{
434
+ "p":0.5247148289,
435
+ "r":0.3709677419,
436
+ "f":0.4346456693
437
+ },
438
+ "TIME":{
439
+ "p":0.7209302326,
440
+ "r":0.7524271845,
441
+ "f":0.7363420428
442
+ },
443
+ "NORP":{
444
+ "p":0.6646153846,
445
+ "r":0.4537815126,
446
+ "f":0.5393258427
447
+ },
448
+ "WORK_OF_ART":{
449
+ "p":0.5733333333,
450
+ "r":0.2866666667,
451
+ "f":0.3822222222
452
+ },
453
+ "PRODUCT":{
454
+ "p":0.2,
455
+ "r":0.0612244898,
456
+ "f":0.09375
457
+ },
458
+ "MONEY":{
459
+ "p":0.9230769231,
460
+ "r":0.8,
461
+ "f":0.8571428571
462
+ },
463
+ "PERCENT":{
464
+ "p":0.7613636364,
465
+ "r":0.8072289157,
466
+ "f":0.783625731
467
+ },
468
+ "EVENT":{
469
+ "p":0.5688073394,
470
+ "r":0.4558823529,
471
+ "f":0.506122449
472
+ },
473
+ "LAW":{
474
+ "p":0.4814814815,
475
+ "r":0.2166666667,
476
+ "f":0.2988505747
477
+ },
478
+ "LANGUAGE":{
479
+ "p":0.6363636364,
480
+ "r":0.7777777778,
481
+ "f":0.7
482
+ }
483
+ }
484
+ },
485
+ "sources":[
486
+ {
487
+ "name":"OntoNotes 5",
488
+ "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
489
+ "license":"commercial (licensed by Explosion)",
490
+ "author":"Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston"
491
+ },
492
+ {
493
+ "name":"CoreNLP Universal Dependencies Converter",
494
+ "url":"https://nlp.stanford.edu/software/stanford-dependencies.html",
495
+ "author":"Stanford NLP Group",
496
+ "license":"Citation provided for reference, no code packaged with model"
497
+ },
498
+ {
499
+ "name":"Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)",
500
+ "url":"https://spacy.io",
501
+ "license":"CC0",
502
+ "author":"Explosion"
503
+ }
504
+ ],
505
+ "requirements":[
506
+ "spacy-pkuseg>=0.0.27,<0.1.0"
507
+ ]
508
+ }
ner/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":1,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
ner/model ADDED
Binary file (6.96 MB). View file
 
ner/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{},"1":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"2":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"3":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"4":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336,"":1},"5":{"":1}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
parser/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":30,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
parser/model ADDED
Binary file (309 kB). View file
 
parser/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{"":406716},"1":{"":267231},"2":{"advmod":56960,"nsubj":53520,"compound:nn":43919,"dep":40111,"punct":36035,"case":23986,"nmod:assmod":21599,"nmod:prep":20098,"amod":16922,"acl":11979,"conj":10687,"cop":7238,"det":7210,"nummod":6994,"cc":6235,"aux:modal":5566,"nmod:tmod":5335,"nmod":4915,"neg":4363,"xcomp":3881,"appos":2955,"nmod:topic":2410,"discourse":2163,"advmod:loc":1591,"aux:prtmod":1539,"aux:ba":1311,"auxpass":1220,"advmod:dvp":1142,"advcl:loc":1046,"name":1032,"compound:vc":830,"nmod:poss":560,"amod:ordmod":511,"dobj":406,"nsubjpass":263,"nsubj:xsubj||ccomp":62,"parataxis:prnmod":34,"nsubj:xsubj":32},"3":{"punct":74006,"dobj":45383,"conj":30040,"case":30024,"dep":18660,"ccomp":17216,"mark":16600,"mark:clf":11551,"aux:asp":7896,"discourse":3998,"advmod:rcomp":2387,"nmod:range":1885,"cc":1675,"nmod:prep":1595,"advmod":1116,"etc":941,"compound:vc":790,"parataxis:prnmod":693,"advmod:loc":522,"neg":69,"advcl:loc":39,"acl":39},"4":{"ROOT":34525}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
senter/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
senter/model ADDED
Binary file (213 kB). View file
 
tagger/cfg ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "AD",
4
+ "AS",
5
+ "BA",
6
+ "CC",
7
+ "CD",
8
+ "CS",
9
+ "DEC",
10
+ "DEG",
11
+ "DER",
12
+ "DEV",
13
+ "DT",
14
+ "ETC",
15
+ "FW",
16
+ "IJ",
17
+ "INF",
18
+ "JJ",
19
+ "LB",
20
+ "LC",
21
+ "M",
22
+ "MSP",
23
+ "NN",
24
+ "NR",
25
+ "NT",
26
+ "OD",
27
+ "ON",
28
+ "P",
29
+ "PN",
30
+ "PU",
31
+ "SB",
32
+ "SP",
33
+ "URL",
34
+ "VA",
35
+ "VC",
36
+ "VE",
37
+ "VV",
38
+ "X"
39
+ ]
40
+ }
tagger/model ADDED
Binary file (14.3 kB). View file
 
tok2vec/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
tok2vec/model ADDED
Binary file (6.81 MB). View file
 
tokenizer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "segmenter":"pkuseg"
3
+ }
tokenizer/pkuseg_model/features.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd4322482a7018b9bce9216173ae9d2848efe6d310b468bbb4383fb55c874a18
3
+ size 22685181
tokenizer/pkuseg_model/weights.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ada075eb25a854f71d6e6fa4e7d55e7be0ae049255b1f8f19d05c13b1b68c9e
3
+ size 37508754
tokenizer/pkuseg_processors ADDED
Binary file (4.53 MB). View file
 
vocab/key2row ADDED
Binary file (5.99 MB). View file
 
vocab/lookups.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:401539f9b54cffa79ffd8de96bdd43f4a6caff75dbb63a9cb3655696190fcfb6
3
+ size 9845085
vocab/vectors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8e94a33f5749fc0203883b41ba61814c42dceab6b577d9376fc0565de39b92a
3
+ size 24000128
zh_core_web_md-any-py3-none-any.whl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98a43a118ec771eca56ac0831b666deb509bc188fb75d65c566bce6e6aea8263
3
+ size 78817834