EC2 Default User commited on
Commit
878a400
1 Parent(s): 16a00d2

Update spaCy pipeline

Browse files
README.md CHANGED
@@ -14,47 +14,62 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.7402422611
18
  - name: NER Recall
19
  type: recall
20
- value: 0.6918238994
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.7152145644
 
 
 
 
 
 
 
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
- - name: POS Accuracy
29
  type: accuracy
30
- value: 0.9715755942
31
  - task:
32
- name: SENTER
33
  type: token-classification
34
  metrics:
35
- - name: SENTER Precision
36
- type: precision
37
- value: 0.9862204724
38
- - name: SENTER Recall
39
- type: recall
40
- value: 0.9881656805
41
- - name: SENTER F Score
42
- type: f_score
43
- value: 0.9871921182
44
  - task:
45
- name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
- - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.9214150689
 
 
 
 
 
 
 
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
- - name: Labeled Dependencies Accuracy
56
- type: accuracy
57
- value: 0.9214150689
 
 
 
 
 
 
 
58
  ---
59
  ### Details: https://spacy.io/models/ja#ja_core_news_lg
60
 
@@ -63,8 +78,8 @@ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser,
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `ja_core_news_lg` |
66
- | **Version** | `3.2.0` |
67
- | **spaCy** | `>=3.2.0,<3.3.0` |
68
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `ner` |
69
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
  | **Vectors** | 480443 keys, 480443 unique vectors (300 dimensions) |
@@ -76,13 +91,12 @@ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser,
76
 
77
  <details>
78
 
79
- <summary>View label scheme (66 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
  | **`morphologizer`** | `POS=NOUN`, `POS=ADP`, `POS=VERB`, `POS=SCONJ`, `POS=AUX`, `POS=PUNCT`, `POS=PART`, `POS=DET`, `POS=NUM`, `POS=ADV`, `POS=PRON`, `POS=ADJ`, `POS=PROPN`, `POS=CCONJ`, `POS=SYM`, `POS=NOUN\|Polarity=Neg`, `POS=AUX\|Polarity=Neg`, `POS=INTJ`, `POS=SCONJ\|Polarity=Neg` |
84
  | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct` |
85
- | **`senter`** | `I`, `S` |
86
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART` |
87
 
88
  </details>
@@ -95,18 +109,18 @@ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser,
95
  | `TOKEN_P` | 97.65 |
96
  | `TOKEN_R` | 97.90 |
97
  | `TOKEN_F` | 97.77 |
98
- | `POS_ACC` | 97.36 |
99
- | `MORPH_ACC` | 0.40 |
100
  | `MORPH_MICRO_P` | 34.01 |
101
  | `MORPH_MICRO_R` | 98.04 |
102
  | `MORPH_MICRO_F` | 50.51 |
103
- | `SENTS_P` | 98.62 |
104
- | `SENTS_R` | 98.82 |
105
- | `SENTS_F` | 98.72 |
106
- | `DEP_UAS` | 92.14 |
107
- | `DEP_LAS` | 90.81 |
108
- | `TAG_ACC` | 97.16 |
109
- | `LEMMA_ACC` | 96.59 |
110
- | `ENTS_P` | 74.02 |
111
- | `ENTS_R` | 69.18 |
112
- | `ENTS_F` | 71.52 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.7611940299
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.7056603774
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.7323759791
24
+ - task:
25
+ name: TAG
26
+ type: token-classification
27
+ metrics:
28
+ - name: TAG (XPOS) Accuracy
29
+ type: accuracy
30
+ value: 0.9712488769
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
+ - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9745981102
38
  - task:
39
+ name: MORPH
40
  type: token-classification
41
  metrics:
42
+ - name: Morph (UFeats) Accuracy
43
+ type: accuracy
44
+ value: 0.0
 
 
 
 
 
 
45
  - task:
46
+ name: LEMMA
47
  type: token-classification
48
  metrics:
49
+ - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.965013864
52
+ - task:
53
+ name: UNLABELED_DEPENDENCIES
54
+ type: token-classification
55
+ metrics:
56
+ - name: Unlabeled Attachment Score (UAS)
57
+ type: f_score
58
+ value: 0.9222457341
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
+ - name: Labeled Attachment Score (LAS)
64
+ type: f_score
65
+ value: 0.9090090901
66
+ - task:
67
+ name: SENTS
68
+ type: token-classification
69
+ metrics:
70
+ - name: Sentences F-Score
71
+ type: f_score
72
+ value: 0.9862475442
73
  ---
74
  ### Details: https://spacy.io/models/ja#ja_core_news_lg
75
 
 
78
  | Feature | Description |
79
  | --- | --- |
80
  | **Name** | `ja_core_news_lg` |
81
+ | **Version** | `3.3.0` |
82
+ | **spaCy** | `>=3.3.0.dev0,<3.4.0` |
83
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `ner` |
84
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `ner` |
85
  | **Vectors** | 480443 keys, 480443 unique vectors (300 dimensions) |
 
91
 
92
  <details>
93
 
94
+ <summary>View label scheme (64 labels for 3 components)</summary>
95
 
96
  | Component | Labels |
97
  | --- | --- |
98
  | **`morphologizer`** | `POS=NOUN`, `POS=ADP`, `POS=VERB`, `POS=SCONJ`, `POS=AUX`, `POS=PUNCT`, `POS=PART`, `POS=DET`, `POS=NUM`, `POS=ADV`, `POS=PRON`, `POS=ADJ`, `POS=PROPN`, `POS=CCONJ`, `POS=SYM`, `POS=NOUN\|Polarity=Neg`, `POS=AUX\|Polarity=Neg`, `POS=INTJ`, `POS=SCONJ\|Polarity=Neg` |
99
  | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct` |
 
100
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART` |
101
 
102
  </details>
 
109
  | `TOKEN_P` | 97.65 |
110
  | `TOKEN_R` | 97.90 |
111
  | `TOKEN_F` | 97.77 |
112
+ | `POS_ACC` | 97.46 |
113
+ | `MORPH_ACC` | 0.00 |
114
  | `MORPH_MICRO_P` | 34.01 |
115
  | `MORPH_MICRO_R` | 98.04 |
116
  | `MORPH_MICRO_F` | 50.51 |
117
+ | `SENTS_P` | 98.24 |
118
+ | `SENTS_R` | 99.01 |
119
+ | `SENTS_F` | 98.62 |
120
+ | `DEP_UAS` | 92.22 |
121
+ | `DEP_LAS` | 90.90 |
122
+ | `TAG_ACC` | 97.12 |
123
+ | `LEMMA_ACC` | 96.50 |
124
+ | `ENTS_P` | 76.12 |
125
+ | `ENTS_R` | 70.57 |
126
+ | `ENTS_F` | 73.24 |
accuracy.json CHANGED
@@ -3,8 +3,8 @@
3
  "token_p": 0.9764591282,
4
  "token_r": 0.9790021974,
5
  "token_f": 0.9777290092,
6
- "pos_acc": 0.9736163946,
7
- "morph_acc": 0.0040005162,
8
  "morph_micro_p": 0.3401360544,
9
  "morph_micro_r": 0.9803921569,
10
  "morph_micro_f": 0.5050505051,
@@ -25,219 +25,224 @@
25
  "f": 0.0
26
  }
27
  },
28
- "sents_p": 0.9862204724,
29
- "sents_r": 0.9881656805,
30
- "sents_f": 0.9871921182,
31
- "dep_uas": 0.9214150689,
32
- "dep_las": 0.9080930316,
33
  "dep_las_per_type": {
34
  "cc": {
35
- "p": 0.75,
36
- "r": 0.75,
37
- "f": 0.75
38
  },
39
  "compound": {
40
- "p": 0.9486581097,
41
- "r": 0.916572717,
42
- "f": 0.9323394495
43
  },
44
  "obl": {
45
- "p": 0.8233082707,
46
- "r": 0.8202247191,
47
- "f": 0.8217636023
48
  },
49
  "case": {
50
- "p": 0.9892679187,
51
- "r": 0.9806231003,
52
- "f": 0.9849265407
53
  },
54
  "dislocated": {
55
- "p": 0.5,
56
- "r": 0.4615384615,
57
- "f": 0.48
58
  },
59
  "nsubj": {
60
- "p": 0.8281853282,
61
- "r": 0.8234165067,
62
- "f": 0.8257940327
63
  },
64
  "nmod": {
65
- "p": 0.87875,
66
- "r": 0.8222222222,
67
- "f": 0.8495468278
68
  },
69
  "root": {
70
- "p": 0.9560878244,
71
- "r": 0.9447731755,
72
- "f": 0.9503968254
73
  },
74
  "aux": {
75
- "p": 0.9751381215,
76
- "r": 0.9832869081,
77
- "f": 0.9791955617
78
  },
79
  "advcl": {
80
- "p": 0.6810933941,
81
- "r": 0.6719101124,
82
- "f": 0.6764705882
83
  },
84
  "mark": {
85
- "p": 0.971659919,
86
- "r": 0.96,
87
- "f": 0.9657947686
88
  },
89
  "fixed": {
90
- "p": 0.9571428571,
91
- "r": 0.9745454545,
92
- "f": 0.9657657658
93
  },
94
  "acl": {
95
- "p": 0.8492239468,
96
- "r": 0.8417582418,
97
- "f": 0.8454746137
98
  },
99
  "obj": {
100
- "p": 0.9662576687,
101
- "r": 0.9516616314,
102
- "f": 0.9589041096
103
  },
104
  "nummod": {
105
- "p": 0.9806451613,
106
  "r": 0.899408284,
107
- "f": 0.9382716049
108
  },
109
  "advmod": {
110
- "p": 0.6691729323,
111
- "r": 0.6357142857,
112
- "f": 0.652014652
113
  },
114
  "amod": {
115
- "p": 0.9310344828,
116
- "r": 0.7297297297,
117
- "f": 0.8181818182
118
  },
119
  "cop": {
120
- "p": 0.9634146341,
121
- "r": 0.9186046512,
122
- "f": 0.9404761905
123
  },
124
  "ccomp": {
125
- "p": 0.8571428571,
126
- "r": 0.8181818182,
127
- "f": 0.8372093023
128
- },
129
- "csubj": {
130
- "p": 0.4444444444,
131
- "r": 0.6666666667,
132
- "f": 0.5333333333
133
  },
134
  "det": {
135
- "p": 0.9807692308,
136
- "r": 0.9622641509,
137
- "f": 0.9714285714
 
 
 
 
 
138
  },
139
  "dep": {
140
- "p": 0.0769230769,
141
- "r": 0.1428571429,
142
- "f": 0.1
143
  }
144
  },
145
- "tag_acc": 0.9715755942,
146
- "lemma_acc": 0.9659109444,
147
- "ents_p": 0.7402422611,
148
- "ents_r": 0.6918238994,
149
- "ents_f": 0.7152145644,
150
  "ents_per_type": {
151
  "DATE": {
152
- "p": 0.9553571429,
153
- "r": 0.9816513761,
154
- "f": 0.9683257919
155
  },
156
  "ORG": {
157
- "p": 0.5916666667,
158
- "r": 0.5182481752,
159
- "f": 0.5525291829
160
  },
161
  "PERSON": {
162
- "p": 0.7816901408,
163
- "r": 0.7985611511,
164
- "f": 0.7900355872
165
  },
166
  "GPE": {
167
- "p": 0.6774193548,
168
- "r": 0.670212766,
169
- "f": 0.6737967914
170
  },
171
- "QUANTITY": {
172
- "p": 0.8194444444,
173
- "r": 0.8939393939,
174
- "f": 0.8550724638
175
  },
176
  "TIME": {
177
- "p": 0.6666666667,
178
  "r": 1.0,
179
- "f": 0.8
 
 
 
 
 
180
  },
181
  "NORP": {
182
- "p": 0.7407407407,
183
  "r": 0.625,
184
- "f": 0.6779661017
185
  },
186
  "ORDINAL": {
187
- "p": 0.56,
188
  "r": 0.6363636364,
189
- "f": 0.5957446809
190
  },
191
  "TITLE_AFFIX": {
192
- "p": 0.7916666667,
193
- "r": 0.6333333333,
194
- "f": 0.7037037037
195
  },
196
  "WORK_OF_ART": {
197
- "p": 0.75,
198
  "r": 0.7058823529,
199
- "f": 0.7272727273
200
- },
201
- "EVENT": {
202
- "p": 0.8823529412,
203
- "r": 0.5769230769,
204
- "f": 0.6976744186
205
  },
206
  "PERCENT": {
207
  "p": 1.0,
208
  "r": 0.2857142857,
209
  "f": 0.4444444444
210
  },
 
 
 
 
 
211
  "CARDINAL": {
212
  "p": 0.0,
213
  "r": 0.0,
214
  "f": 0.0
215
  },
216
- "FAC": {
217
- "p": 0.5666666667,
218
- "r": 0.4594594595,
219
- "f": 0.5074626866
220
- },
221
  "LOC": {
222
  "p": 0.5,
223
  "r": 0.8,
224
  "f": 0.6153846154
225
  },
 
 
 
 
 
226
  "MOVEMENT": {
227
  "p": 0.0,
228
  "r": 0.0,
229
  "f": 0.0
230
  },
231
- "PRODUCT": {
232
- "p": 0.5384615385,
233
- "r": 0.3333333333,
234
- "f": 0.4117647059
235
- },
236
  "LAW": {
237
  "p": 1.0,
238
  "r": 0.3333333333,
239
  "f": 0.5
240
  },
 
 
 
 
 
241
  "MONEY": {
242
  "p": 1.0,
243
  "r": 1.0,
@@ -249,5 +254,5 @@
249
  "f": 1.0
250
  }
251
  },
252
- "speed": 4912.7299798978
253
  }
 
3
  "token_p": 0.9764591282,
4
  "token_r": 0.9790021974,
5
  "token_f": 0.9777290092,
6
+ "pos_acc": 0.9745981102,
7
+ "morph_acc": 0.0,
8
  "morph_micro_p": 0.3401360544,
9
  "morph_micro_r": 0.9803921569,
10
  "morph_micro_f": 0.5050505051,
 
25
  "f": 0.0
26
  }
27
  },
28
+ "sents_p": 0.9823874755,
29
+ "sents_r": 0.9901380671,
30
+ "sents_f": 0.9862475442,
31
+ "dep_uas": 0.9222457341,
32
+ "dep_las": 0.9090090901,
33
  "dep_las_per_type": {
34
  "cc": {
35
+ "p": 0.7755102041,
36
+ "r": 0.7916666667,
37
+ "f": 0.7835051546
38
  },
39
  "compound": {
40
+ "p": 0.9439906651,
41
+ "r": 0.9120631342,
42
+ "f": 0.9277522936
43
  },
44
  "obl": {
45
+ "p": 0.8172715895,
46
+ "r": 0.8152309613,
47
+ "f": 0.81625
48
  },
49
  "case": {
50
+ "p": 0.9900230238,
51
+ "r": 0.9802431611,
52
+ "f": 0.9851088202
53
  },
54
  "dislocated": {
55
+ "p": 0.7,
56
+ "r": 0.5384615385,
57
+ "f": 0.6086956522
58
  },
59
  "nsubj": {
60
+ "p": 0.8349514563,
61
+ "r": 0.8253358925,
62
+ "f": 0.8301158301
63
  },
64
  "nmod": {
65
+ "p": 0.8793532338,
66
+ "r": 0.8269005848,
67
+ "f": 0.8523206751
68
  },
69
  "root": {
70
+ "p": 0.958250497,
71
+ "r": 0.9506903353,
72
+ "f": 0.9544554455
73
  },
74
  "aux": {
75
+ "p": 0.9851301115,
76
+ "r": 0.9842154132,
77
+ "f": 0.9846725499
78
  },
79
  "advcl": {
80
+ "p": 0.693877551,
81
+ "r": 0.6876404494,
82
+ "f": 0.690744921
83
  },
84
  "mark": {
85
+ "p": 0.9775510204,
86
+ "r": 0.958,
87
+ "f": 0.9676767677
88
  },
89
  "fixed": {
90
+ "p": 0.9536541889,
91
+ "r": 0.9727272727,
92
+ "f": 0.9630963096
93
  },
94
  "acl": {
95
+ "p": 0.8409586057,
96
+ "r": 0.8483516484,
97
+ "f": 0.8446389497
98
  },
99
  "obj": {
100
+ "p": 0.9541284404,
101
+ "r": 0.9425981873,
102
+ "f": 0.9483282675
103
  },
104
  "nummod": {
105
+ "p": 0.987012987,
106
  "r": 0.899408284,
107
+ "f": 0.9411764706
108
  },
109
  "advmod": {
110
+ "p": 0.6904761905,
111
+ "r": 0.6214285714,
112
+ "f": 0.6541353383
113
  },
114
  "amod": {
115
+ "p": 0.962962963,
116
+ "r": 0.7027027027,
117
+ "f": 0.8125
118
  },
119
  "cop": {
120
+ "p": 0.9534883721,
121
+ "r": 0.9534883721,
122
+ "f": 0.9534883721
123
  },
124
  "ccomp": {
125
+ "p": 0.68,
126
+ "r": 0.7727272727,
127
+ "f": 0.7234042553
 
 
 
 
 
128
  },
129
  "det": {
130
+ "p": 1.0,
131
+ "r": 0.9811320755,
132
+ "f": 0.9904761905
133
+ },
134
+ "csubj": {
135
+ "p": 0.6923076923,
136
+ "r": 0.75,
137
+ "f": 0.72
138
  },
139
  "dep": {
140
+ "p": 0.0,
141
+ "r": 0.0,
142
+ "f": 0.0
143
  }
144
  },
145
+ "tag_acc": 0.9712488769,
146
+ "lemma_acc": 0.965013864,
147
+ "ents_p": 0.7611940299,
148
+ "ents_r": 0.7056603774,
149
+ "ents_f": 0.7323759791,
150
  "ents_per_type": {
151
  "DATE": {
152
+ "p": 0.9541284404,
153
+ "r": 0.9541284404,
154
+ "f": 0.9541284404
155
  },
156
  "ORG": {
157
+ "p": 0.6639344262,
158
+ "r": 0.5912408759,
159
+ "f": 0.6254826255
160
  },
161
  "PERSON": {
162
+ "p": 0.7714285714,
163
+ "r": 0.7769784173,
164
+ "f": 0.7741935484
165
  },
166
  "GPE": {
167
+ "p": 0.7272727273,
168
+ "r": 0.6808510638,
169
+ "f": 0.7032967033
170
  },
171
+ "PRODUCT": {
172
+ "p": 0.5,
173
+ "r": 0.4047619048,
174
+ "f": 0.4473684211
175
  },
176
  "TIME": {
177
+ "p": 0.5714285714,
178
  "r": 1.0,
179
+ "f": 0.7272727273
180
+ },
181
+ "QUANTITY": {
182
+ "p": 0.884057971,
183
+ "r": 0.9242424242,
184
+ "f": 0.9037037037
185
  },
186
  "NORP": {
187
+ "p": 0.8,
188
  "r": 0.625,
189
+ "f": 0.701754386
190
  },
191
  "ORDINAL": {
192
+ "p": 0.7,
193
  "r": 0.6363636364,
194
+ "f": 0.6666666667
195
  },
196
  "TITLE_AFFIX": {
197
+ "p": 0.7407407407,
198
+ "r": 0.6666666667,
199
+ "f": 0.701754386
200
  },
201
  "WORK_OF_ART": {
202
+ "p": 0.8571428571,
203
  "r": 0.7058823529,
204
+ "f": 0.7741935484
 
 
 
 
 
205
  },
206
  "PERCENT": {
207
  "p": 1.0,
208
  "r": 0.2857142857,
209
  "f": 0.4444444444
210
  },
211
+ "EVENT": {
212
+ "p": 0.75,
213
+ "r": 0.6923076923,
214
+ "f": 0.72
215
+ },
216
  "CARDINAL": {
217
  "p": 0.0,
218
  "r": 0.0,
219
  "f": 0.0
220
  },
 
 
 
 
 
221
  "LOC": {
222
  "p": 0.5,
223
  "r": 0.8,
224
  "f": 0.6153846154
225
  },
226
+ "FAC": {
227
+ "p": 0.6666666667,
228
+ "r": 0.3783783784,
229
+ "f": 0.4827586207
230
+ },
231
  "MOVEMENT": {
232
  "p": 0.0,
233
  "r": 0.0,
234
  "f": 0.0
235
  },
 
 
 
 
 
236
  "LAW": {
237
  "p": 1.0,
238
  "r": 0.3333333333,
239
  "f": 0.5
240
  },
241
+ "PET_NAME": {
242
+ "p": 0.0,
243
+ "r": 0.0,
244
+ "f": 0.0
245
+ },
246
  "MONEY": {
247
  "p": 1.0,
248
  "r": 1.0,
 
254
  "f": 1.0
255
  }
256
  },
257
+ "speed": 9177.2419617522
258
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
 
config.cfg CHANGED
@@ -35,8 +35,9 @@ overwrite = true
35
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
36
 
37
  [components.morphologizer.model]
38
- @architectures = "spacy.Tagger.v1"
39
  nO = null
 
40
 
41
  [components.morphologizer.model.tok2vec]
42
  @architectures = "spacy.Tok2VecListener.v1"
@@ -66,7 +67,7 @@ nO = null
66
  @architectures = "spacy.MultiHashEmbed.v2"
67
  width = 96
68
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
69
- rows = [5000,2500,2500,2500]
70
  include_static_vectors = true
71
 
72
  [components.ner.model.tok2vec.encode]
@@ -104,8 +105,9 @@ overwrite = false
104
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
105
 
106
  [components.senter.model]
107
- @architectures = "spacy.Tagger.v1"
108
  nO = null
 
109
 
110
  [components.senter.model.tok2vec]
111
  @architectures = "spacy.Tok2Vec.v2"
@@ -134,7 +136,7 @@ factory = "tok2vec"
134
  @architectures = "spacy.MultiHashEmbed.v2"
135
  width = ${components.tok2vec.model.encode:width}
136
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
137
- rows = [5000,2500,2500,2500]
138
  include_static_vectors = true
139
 
140
  [components.tok2vec.model.encode]
@@ -171,7 +173,7 @@ dropout = 0.1
171
  accumulate_gradient = 1
172
  patience = 5000
173
  max_epochs = 0
174
- max_steps = 0
175
  eval_frequency = 1000
176
  frozen_components = []
177
  before_to_disk = null
 
35
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
36
 
37
  [components.morphologizer.model]
38
+ @architectures = "spacy.Tagger.v2"
39
  nO = null
40
+ normalize = false
41
 
42
  [components.morphologizer.model.tok2vec]
43
  @architectures = "spacy.Tok2VecListener.v1"
 
67
  @architectures = "spacy.MultiHashEmbed.v2"
68
  width = 96
69
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
70
+ rows = [5000,1000,2500,2500]
71
  include_static_vectors = true
72
 
73
  [components.ner.model.tok2vec.encode]
 
105
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
106
 
107
  [components.senter.model]
108
+ @architectures = "spacy.Tagger.v2"
109
  nO = null
110
+ normalize = false
111
 
112
  [components.senter.model.tok2vec]
113
  @architectures = "spacy.Tok2Vec.v2"
 
136
  @architectures = "spacy.MultiHashEmbed.v2"
137
  width = ${components.tok2vec.model.encode:width}
138
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
139
+ rows = [5000,1000,2500,2500]
140
  include_static_vectors = true
141
 
142
  [components.tok2vec.model.encode]
 
173
  accumulate_gradient = 1
174
  patience = 5000
175
  max_epochs = 0
176
+ max_steps = 100000
177
  eval_frequency = 1000
178
  frozen_components = []
179
  before_to_disk = null
ja_core_news_lg-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1aa64940154d4c04423c309463862d417ab7e232fbf55f790e25f89bb92912bf
3
- size 556188875
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e41957fc0c8fbf0593d39b975c6cf36a4742dfb7d00b2c4b48161814907db28
3
+ size 555135908
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"ja",
3
  "name":"core_news_lg",
4
- "version":"3.2.0",
5
  "description":"Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.2.0,<3.3.0",
11
- "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":300,
14
  "vectors":480443,
@@ -65,10 +65,6 @@
65
  "obl",
66
  "punct"
67
  ],
68
- "senter":[
69
- "I",
70
- "S"
71
- ],
72
  "attribute_ruler":[
73
 
74
  ],
@@ -120,8 +116,8 @@
120
  "token_p":0.9764591282,
121
  "token_r":0.9790021974,
122
  "token_f":0.9777290092,
123
- "pos_acc":0.9736163946,
124
- "morph_acc":0.0040005162,
125
  "morph_micro_p":0.3401360544,
126
  "morph_micro_r":0.9803921569,
127
  "morph_micro_f":0.5050505051,
@@ -142,219 +138,224 @@
142
  "f":0.0
143
  }
144
  },
145
- "sents_p":0.9862204724,
146
- "sents_r":0.9881656805,
147
- "sents_f":0.9871921182,
148
- "dep_uas":0.9214150689,
149
- "dep_las":0.9080930316,
150
  "dep_las_per_type":{
151
  "cc":{
152
- "p":0.75,
153
- "r":0.75,
154
- "f":0.75
155
  },
156
  "compound":{
157
- "p":0.9486581097,
158
- "r":0.916572717,
159
- "f":0.9323394495
160
  },
161
  "obl":{
162
- "p":0.8233082707,
163
- "r":0.8202247191,
164
- "f":0.8217636023
165
  },
166
  "case":{
167
- "p":0.9892679187,
168
- "r":0.9806231003,
169
- "f":0.9849265407
170
  },
171
  "dislocated":{
172
- "p":0.5,
173
- "r":0.4615384615,
174
- "f":0.48
175
  },
176
  "nsubj":{
177
- "p":0.8281853282,
178
- "r":0.8234165067,
179
- "f":0.8257940327
180
  },
181
  "nmod":{
182
- "p":0.87875,
183
- "r":0.8222222222,
184
- "f":0.8495468278
185
  },
186
  "root":{
187
- "p":0.9560878244,
188
- "r":0.9447731755,
189
- "f":0.9503968254
190
  },
191
  "aux":{
192
- "p":0.9751381215,
193
- "r":0.9832869081,
194
- "f":0.9791955617
195
  },
196
  "advcl":{
197
- "p":0.6810933941,
198
- "r":0.6719101124,
199
- "f":0.6764705882
200
  },
201
  "mark":{
202
- "p":0.971659919,
203
- "r":0.96,
204
- "f":0.9657947686
205
  },
206
  "fixed":{
207
- "p":0.9571428571,
208
- "r":0.9745454545,
209
- "f":0.9657657658
210
  },
211
  "acl":{
212
- "p":0.8492239468,
213
- "r":0.8417582418,
214
- "f":0.8454746137
215
  },
216
  "obj":{
217
- "p":0.9662576687,
218
- "r":0.9516616314,
219
- "f":0.9589041096
220
  },
221
  "nummod":{
222
- "p":0.9806451613,
223
  "r":0.899408284,
224
- "f":0.9382716049
225
  },
226
  "advmod":{
227
- "p":0.6691729323,
228
- "r":0.6357142857,
229
- "f":0.652014652
230
  },
231
  "amod":{
232
- "p":0.9310344828,
233
- "r":0.7297297297,
234
- "f":0.8181818182
235
  },
236
  "cop":{
237
- "p":0.9634146341,
238
- "r":0.9186046512,
239
- "f":0.9404761905
240
  },
241
  "ccomp":{
242
- "p":0.8571428571,
243
- "r":0.8181818182,
244
- "f":0.8372093023
245
- },
246
- "csubj":{
247
- "p":0.4444444444,
248
- "r":0.6666666667,
249
- "f":0.5333333333
250
  },
251
  "det":{
252
- "p":0.9807692308,
253
- "r":0.9622641509,
254
- "f":0.9714285714
 
 
 
 
 
255
  },
256
  "dep":{
257
- "p":0.0769230769,
258
- "r":0.1428571429,
259
- "f":0.1
260
  }
261
  },
262
- "tag_acc":0.9715755942,
263
- "lemma_acc":0.9659109444,
264
- "ents_p":0.7402422611,
265
- "ents_r":0.6918238994,
266
- "ents_f":0.7152145644,
267
  "ents_per_type":{
268
  "DATE":{
269
- "p":0.9553571429,
270
- "r":0.9816513761,
271
- "f":0.9683257919
272
  },
273
  "ORG":{
274
- "p":0.5916666667,
275
- "r":0.5182481752,
276
- "f":0.5525291829
277
  },
278
  "PERSON":{
279
- "p":0.7816901408,
280
- "r":0.7985611511,
281
- "f":0.7900355872
282
  },
283
  "GPE":{
284
- "p":0.6774193548,
285
- "r":0.670212766,
286
- "f":0.6737967914
287
  },
288
- "QUANTITY":{
289
- "p":0.8194444444,
290
- "r":0.8939393939,
291
- "f":0.8550724638
292
  },
293
  "TIME":{
294
- "p":0.6666666667,
295
  "r":1.0,
296
- "f":0.8
 
 
 
 
 
297
  },
298
  "NORP":{
299
- "p":0.7407407407,
300
  "r":0.625,
301
- "f":0.6779661017
302
  },
303
  "ORDINAL":{
304
- "p":0.56,
305
  "r":0.6363636364,
306
- "f":0.5957446809
307
  },
308
  "TITLE_AFFIX":{
309
- "p":0.7916666667,
310
- "r":0.6333333333,
311
- "f":0.7037037037
312
  },
313
  "WORK_OF_ART":{
314
- "p":0.75,
315
  "r":0.7058823529,
316
- "f":0.7272727273
317
- },
318
- "EVENT":{
319
- "p":0.8823529412,
320
- "r":0.5769230769,
321
- "f":0.6976744186
322
  },
323
  "PERCENT":{
324
  "p":1.0,
325
  "r":0.2857142857,
326
  "f":0.4444444444
327
  },
 
 
 
 
 
328
  "CARDINAL":{
329
  "p":0.0,
330
  "r":0.0,
331
  "f":0.0
332
  },
333
- "FAC":{
334
- "p":0.5666666667,
335
- "r":0.4594594595,
336
- "f":0.5074626866
337
- },
338
  "LOC":{
339
  "p":0.5,
340
  "r":0.8,
341
  "f":0.6153846154
342
  },
 
 
 
 
 
343
  "MOVEMENT":{
344
  "p":0.0,
345
  "r":0.0,
346
  "f":0.0
347
  },
348
- "PRODUCT":{
349
- "p":0.5384615385,
350
- "r":0.3333333333,
351
- "f":0.4117647059
352
- },
353
  "LAW":{
354
  "p":1.0,
355
  "r":0.3333333333,
356
  "f":0.5
357
  },
 
 
 
 
 
358
  "MONEY":{
359
  "p":1.0,
360
  "r":1.0,
@@ -366,7 +367,7 @@
366
  "f":1.0
367
  }
368
  },
369
- "speed":4912.7299798978
370
  },
371
  "sources":[
372
  {
@@ -389,7 +390,7 @@
389
  }
390
  ],
391
  "requirements":[
392
- "sudachipy>=0.4.9",
393
- "sudachidict-core>=20200330"
394
  ]
395
  }
 
1
  {
2
  "lang":"ja",
3
  "name":"core_news_lg",
4
+ "version":"3.3.0",
5
  "description":"Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.3.0.dev0,<3.4.0",
11
+ "spacy_git_version":"849bef2de",
12
  "vectors":{
13
  "width":300,
14
  "vectors":480443,
 
65
  "obl",
66
  "punct"
67
  ],
 
 
 
 
68
  "attribute_ruler":[
69
 
70
  ],
 
116
  "token_p":0.9764591282,
117
  "token_r":0.9790021974,
118
  "token_f":0.9777290092,
119
+ "pos_acc":0.9745981102,
120
+ "morph_acc":0.0,
121
  "morph_micro_p":0.3401360544,
122
  "morph_micro_r":0.9803921569,
123
  "morph_micro_f":0.5050505051,
 
138
  "f":0.0
139
  }
140
  },
141
+ "sents_p":0.9823874755,
142
+ "sents_r":0.9901380671,
143
+ "sents_f":0.9862475442,
144
+ "dep_uas":0.9222457341,
145
+ "dep_las":0.9090090901,
146
  "dep_las_per_type":{
147
  "cc":{
148
+ "p":0.7755102041,
149
+ "r":0.7916666667,
150
+ "f":0.7835051546
151
  },
152
  "compound":{
153
+ "p":0.9439906651,
154
+ "r":0.9120631342,
155
+ "f":0.9277522936
156
  },
157
  "obl":{
158
+ "p":0.8172715895,
159
+ "r":0.8152309613,
160
+ "f":0.81625
161
  },
162
  "case":{
163
+ "p":0.9900230238,
164
+ "r":0.9802431611,
165
+ "f":0.9851088202
166
  },
167
  "dislocated":{
168
+ "p":0.7,
169
+ "r":0.5384615385,
170
+ "f":0.6086956522
171
  },
172
  "nsubj":{
173
+ "p":0.8349514563,
174
+ "r":0.8253358925,
175
+ "f":0.8301158301
176
  },
177
  "nmod":{
178
+ "p":0.8793532338,
179
+ "r":0.8269005848,
180
+ "f":0.8523206751
181
  },
182
  "root":{
183
+ "p":0.958250497,
184
+ "r":0.9506903353,
185
+ "f":0.9544554455
186
  },
187
  "aux":{
188
+ "p":0.9851301115,
189
+ "r":0.9842154132,
190
+ "f":0.9846725499
191
  },
192
  "advcl":{
193
+ "p":0.693877551,
194
+ "r":0.6876404494,
195
+ "f":0.690744921
196
  },
197
  "mark":{
198
+ "p":0.9775510204,
199
+ "r":0.958,
200
+ "f":0.9676767677
201
  },
202
  "fixed":{
203
+ "p":0.9536541889,
204
+ "r":0.9727272727,
205
+ "f":0.9630963096
206
  },
207
  "acl":{
208
+ "p":0.8409586057,
209
+ "r":0.8483516484,
210
+ "f":0.8446389497
211
  },
212
  "obj":{
213
+ "p":0.9541284404,
214
+ "r":0.9425981873,
215
+ "f":0.9483282675
216
  },
217
  "nummod":{
218
+ "p":0.987012987,
219
  "r":0.899408284,
220
+ "f":0.9411764706
221
  },
222
  "advmod":{
223
+ "p":0.6904761905,
224
+ "r":0.6214285714,
225
+ "f":0.6541353383
226
  },
227
  "amod":{
228
+ "p":0.962962963,
229
+ "r":0.7027027027,
230
+ "f":0.8125
231
  },
232
  "cop":{
233
+ "p":0.9534883721,
234
+ "r":0.9534883721,
235
+ "f":0.9534883721
236
  },
237
  "ccomp":{
238
+ "p":0.68,
239
+ "r":0.7727272727,
240
+ "f":0.7234042553
 
 
 
 
 
241
  },
242
  "det":{
243
+ "p":1.0,
244
+ "r":0.9811320755,
245
+ "f":0.9904761905
246
+ },
247
+ "csubj":{
248
+ "p":0.6923076923,
249
+ "r":0.75,
250
+ "f":0.72
251
  },
252
  "dep":{
253
+ "p":0.0,
254
+ "r":0.0,
255
+ "f":0.0
256
  }
257
  },
258
+ "tag_acc":0.9712488769,
259
+ "lemma_acc":0.965013864,
260
+ "ents_p":0.7611940299,
261
+ "ents_r":0.7056603774,
262
+ "ents_f":0.7323759791,
263
  "ents_per_type":{
264
  "DATE":{
265
+ "p":0.9541284404,
266
+ "r":0.9541284404,
267
+ "f":0.9541284404
268
  },
269
  "ORG":{
270
+ "p":0.6639344262,
271
+ "r":0.5912408759,
272
+ "f":0.6254826255
273
  },
274
  "PERSON":{
275
+ "p":0.7714285714,
276
+ "r":0.7769784173,
277
+ "f":0.7741935484
278
  },
279
  "GPE":{
280
+ "p":0.7272727273,
281
+ "r":0.6808510638,
282
+ "f":0.7032967033
283
  },
284
+ "PRODUCT":{
285
+ "p":0.5,
286
+ "r":0.4047619048,
287
+ "f":0.4473684211
288
  },
289
  "TIME":{
290
+ "p":0.5714285714,
291
  "r":1.0,
292
+ "f":0.7272727273
293
+ },
294
+ "QUANTITY":{
295
+ "p":0.884057971,
296
+ "r":0.9242424242,
297
+ "f":0.9037037037
298
  },
299
  "NORP":{
300
+ "p":0.8,
301
  "r":0.625,
302
+ "f":0.701754386
303
  },
304
  "ORDINAL":{
305
+ "p":0.7,
306
  "r":0.6363636364,
307
+ "f":0.6666666667
308
  },
309
  "TITLE_AFFIX":{
310
+ "p":0.7407407407,
311
+ "r":0.6666666667,
312
+ "f":0.701754386
313
  },
314
  "WORK_OF_ART":{
315
+ "p":0.8571428571,
316
  "r":0.7058823529,
317
+ "f":0.7741935484
 
 
 
 
 
318
  },
319
  "PERCENT":{
320
  "p":1.0,
321
  "r":0.2857142857,
322
  "f":0.4444444444
323
  },
324
+ "EVENT":{
325
+ "p":0.75,
326
+ "r":0.6923076923,
327
+ "f":0.72
328
+ },
329
  "CARDINAL":{
330
  "p":0.0,
331
  "r":0.0,
332
  "f":0.0
333
  },
 
 
 
 
 
334
  "LOC":{
335
  "p":0.5,
336
  "r":0.8,
337
  "f":0.6153846154
338
  },
339
+ "FAC":{
340
+ "p":0.6666666667,
341
+ "r":0.3783783784,
342
+ "f":0.4827586207
343
+ },
344
  "MOVEMENT":{
345
  "p":0.0,
346
  "r":0.0,
347
  "f":0.0
348
  },
 
 
 
 
 
349
  "LAW":{
350
  "p":1.0,
351
  "r":0.3333333333,
352
  "f":0.5
353
  },
354
+ "PET_NAME":{
355
+ "p":0.0,
356
+ "r":0.0,
357
+ "f":0.0
358
+ },
359
  "MONEY":{
360
  "p":1.0,
361
  "r":1.0,
 
367
  "f":1.0
368
  }
369
  },
370
+ "speed":9177.2419617522
371
  },
372
  "sources":[
373
  {
 
390
  }
391
  ],
392
  "requirements":[
393
+ "sudachipy>=0.5.2,!=0.6.1",
394
+ "sudachidict-core>=20211220"
395
  ]
396
  }
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fa52799c89b4b2f3bfd27211e1bc9724f68b328e6e73efbbd5af21ee973b208d
3
- size 7749
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c1f0b2cd0b894e6aa09768e067626f87db1a3494320d0414220996ac0bea727
3
+ size 7801
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bd1ec15f7a957da81af239427ee9447f85011d3bdd61938bfb82a3b9bdb1d103
3
- size 6961103
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9684ef4a41de356943352cd296fb2b49e30f4312fb040ff425fa0a9f3e8cd4b7
3
+ size 6385103
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f1ef5830947469daf5c77ce70675b8b4790e7348c6da2d3b68036ad49bf85b67
3
  size 299888
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f68fccc840e9dbe4f062ad83ec768c7e3075d99751ae5d913de770613b73677
3
  size 299888
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�~{"0":{"":75051},"1":{"":81581},"2":{"compound":22178,"nmod":11296,"obl":10522,"nsubj":6649,"acl":6185,"advcl":5956,"obj":4364,"nummod":2247,"advmod":1841,"punct":1169,"det":822,"cc":699,"amod":357,"ccomp":335,"dislocated":233,"csubj":139,"dep":0},"3":{"case":35390,"punct":15051,"aux":14506,"fixed":7377,"mark":6390,"cop":2079,"compound":542,"advcl":148,"dep":56},"4":{"ROOT":6810}}�cfg��neg_key�
 
1
+ ��moves�~{"0":{"":77992},"1":{"":83293},"2":{"compound":23506,"nmod":11446,"obl":11030,"nsubj":6884,"advcl":6063,"acl":6020,"obj":4629,"nummod":2487,"advmod":1922,"punct":1321,"det":830,"cc":726,"amod":372,"ccomp":325,"dislocated":235,"csubj":133,"dep":0},"3":{"case":35913,"punct":15455,"aux":14940,"fixed":7391,"mark":6644,"cop":2100,"compound":598,"advcl":152,"dep":58},"4":{"ROOT":7050}}�cfg��neg_key�
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:579d7bdac1f8d5a155770ac75a57c0eedbe10dcec436cd7f8d1303529794eb5f
3
- size 213211
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d03c1c3d77a449456e344f49ecdbadf6265700d553876929c382bb7f762f746e
3
+ size 213263
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:211222e3e2f54463c355e70807feb17213998f5a0950241794d78193d11626ff
3
- size 6811418
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff4955abec8fdefd6a93e0788a840101803222fde3f0c181d078b3c1adadbc1c
3
+ size 6235418
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0a3705f398a0ff28af5fa868cf82b05cb31087511af0019249e05f7ee9dab7db
3
- size 15570179
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9993f3e45abd60896e44b374466400eb1ab3c1a8399fb4432cff07ec6c698a24
3
+ size 15614088