EC2 Default User commited on
Commit
bc843e6
1 Parent(s): ffe2dc9

Update spaCy pipeline

Browse files
README.md CHANGED
@@ -14,47 +14,62 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.6767371601
18
  - name: NER Recall
19
  type: recall
20
- value: 0.5635220126
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.6149622512
 
 
 
 
 
 
 
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
- - name: POS Accuracy
29
  type: accuracy
30
- value: 0.9715755942
31
  - task:
32
- name: SENTER
33
  type: token-classification
34
  metrics:
35
- - name: SENTER Precision
36
- type: precision
37
- value: 0.9823529412
38
- - name: SENTER Recall
39
- type: recall
40
- value: 0.9881656805
41
- - name: SENTER F Score
42
- type: f_score
43
- value: 0.9852507375
44
  - task:
45
- name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
- - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.921981982
 
 
 
 
 
 
 
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
- - name: Labeled Dependencies Accuracy
56
- type: accuracy
57
- value: 0.921981982
 
 
 
 
 
 
 
58
  ---
59
  ### Details: https://spacy.io/models/ja#ja_core_news_sm
60
 
@@ -63,8 +78,8 @@ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser,
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `ja_core_news_sm` |
66
- | **Version** | `3.2.0` |
67
- | **spaCy** | `>=3.2.0,<3.3.0` |
68
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `ner` |
69
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
@@ -76,13 +91,12 @@ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser,
76
 
77
  <details>
78
 
79
- <summary>View label scheme (66 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
  | **`morphologizer`** | `POS=NOUN`, `POS=ADP`, `POS=VERB`, `POS=SCONJ`, `POS=AUX`, `POS=PUNCT`, `POS=PART`, `POS=DET`, `POS=NUM`, `POS=ADV`, `POS=PRON`, `POS=ADJ`, `POS=PROPN`, `POS=CCONJ`, `POS=SYM`, `POS=NOUN\|Polarity=Neg`, `POS=AUX\|Polarity=Neg`, `POS=INTJ`, `POS=SCONJ\|Polarity=Neg` |
84
  | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct` |
85
- | **`senter`** | `I`, `S` |
86
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART` |
87
 
88
  </details>
@@ -95,18 +109,18 @@ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser,
95
  | `TOKEN_P` | 97.65 |
96
  | `TOKEN_R` | 97.90 |
97
  | `TOKEN_F` | 97.77 |
98
- | `POS_ACC` | 96.25 |
99
- | `MORPH_ACC` | 0.40 |
100
  | `MORPH_MICRO_P` | 34.01 |
101
  | `MORPH_MICRO_R` | 98.04 |
102
  | `MORPH_MICRO_F` | 50.51 |
103
- | `SENTS_P` | 98.24 |
104
- | `SENTS_R` | 98.82 |
105
- | `SENTS_F` | 98.53 |
106
- | `DEP_UAS` | 92.20 |
107
- | `DEP_LAS` | 90.69 |
108
- | `TAG_ACC` | 97.16 |
109
- | `LEMMA_ACC` | 96.59 |
110
- | `ENTS_P` | 67.67 |
111
- | `ENTS_R` | 56.35 |
112
- | `ENTS_F` | 61.50 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.6996904025
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.5685534591
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.6273421235
24
+ - task:
25
+ name: TAG
26
+ type: token-classification
27
+ metrics:
28
+ - name: TAG (XPOS) Accuracy
29
+ type: accuracy
30
+ value: 0.9712488769
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
+ - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9616721888
38
  - task:
39
+ name: MORPH
40
  type: token-classification
41
  metrics:
42
+ - name: Morph (UFeats) Accuracy
43
+ type: accuracy
44
+ value: 0.0
 
 
 
 
 
 
45
  - task:
46
+ name: LEMMA
47
  type: token-classification
48
  metrics:
49
+ - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.965013864
52
+ - task:
53
+ name: UNLABELED_DEPENDENCIES
54
+ type: token-classification
55
+ metrics:
56
+ - name: Unlabeled Attachment Score (UAS)
57
+ type: f_score
58
+ value: 0.9207149611
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
+ - name: Labeled Attachment Score (LAS)
64
+ type: f_score
65
+ value: 0.9061220818
66
+ - task:
67
+ name: SENTS
68
+ type: token-classification
69
+ metrics:
70
+ - name: Sentences F-Score
71
+ type: f_score
72
+ value: 0.9774288518
73
  ---
74
  ### Details: https://spacy.io/models/ja#ja_core_news_sm
75
 
 
78
  | Feature | Description |
79
  | --- | --- |
80
  | **Name** | `ja_core_news_sm` |
81
+ | **Version** | `3.3.0` |
82
+ | **spaCy** | `>=3.3.0.dev0,<3.4.0` |
83
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `ner` |
84
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `ner` |
85
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 
91
 
92
  <details>
93
 
94
+ <summary>View label scheme (64 labels for 3 components)</summary>
95
 
96
  | Component | Labels |
97
  | --- | --- |
98
  | **`morphologizer`** | `POS=NOUN`, `POS=ADP`, `POS=VERB`, `POS=SCONJ`, `POS=AUX`, `POS=PUNCT`, `POS=PART`, `POS=DET`, `POS=NUM`, `POS=ADV`, `POS=PRON`, `POS=ADJ`, `POS=PROPN`, `POS=CCONJ`, `POS=SYM`, `POS=NOUN\|Polarity=Neg`, `POS=AUX\|Polarity=Neg`, `POS=INTJ`, `POS=SCONJ\|Polarity=Neg` |
99
  | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct` |
 
100
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART` |
101
 
102
  </details>
 
109
  | `TOKEN_P` | 97.65 |
110
  | `TOKEN_R` | 97.90 |
111
  | `TOKEN_F` | 97.77 |
112
+ | `POS_ACC` | 96.17 |
113
+ | `MORPH_ACC` | 0.00 |
114
  | `MORPH_MICRO_P` | 34.01 |
115
  | `MORPH_MICRO_R` | 98.04 |
116
  | `MORPH_MICRO_F` | 50.51 |
117
+ | `SENTS_P` | 97.27 |
118
+ | `SENTS_R` | 98.22 |
119
+ | `SENTS_F` | 97.74 |
120
+ | `DEP_UAS` | 92.07 |
121
+ | `DEP_LAS` | 90.61 |
122
+ | `TAG_ACC` | 97.12 |
123
+ | `LEMMA_ACC` | 96.50 |
124
+ | `ENTS_P` | 69.97 |
125
+ | `ENTS_R` | 56.86 |
126
+ | `ENTS_F` | 62.73 |
accuracy.json CHANGED
@@ -3,8 +3,8 @@
3
  "token_p": 0.9764591282,
4
  "token_r": 0.9790021974,
5
  "token_f": 0.9777290092,
6
- "pos_acc": 0.9624933535,
7
- "morph_acc": 0.0040005162,
8
  "morph_micro_p": 0.3401360544,
9
  "morph_micro_r": 0.9803921569,
10
  "morph_micro_f": 0.5050505051,
@@ -25,193 +25,198 @@
25
  "f": 0.0
26
  }
27
  },
28
- "sents_p": 0.9823529412,
29
- "sents_r": 0.9881656805,
30
- "sents_f": 0.9852507375,
31
- "dep_uas": 0.921981982,
32
- "dep_las": 0.9069390694,
33
  "dep_las_per_type": {
34
  "cc": {
35
- "p": 0.8163265306,
36
- "r": 0.8333333333,
37
- "f": 0.824742268
38
  },
39
  "compound": {
40
- "p": 0.9411421911,
41
  "r": 0.9103720406,
42
- "f": 0.9255014327
43
  },
44
  "obl": {
45
- "p": 0.8113207547,
46
- "r": 0.8052434457,
47
- "f": 0.8082706767
48
  },
49
  "case": {
50
- "p": 0.9892638037,
51
- "r": 0.9802431611,
52
- "f": 0.9847328244
53
  },
54
  "dislocated": {
55
- "p": 0.7,
56
- "r": 0.5384615385,
57
- "f": 0.6086956522
58
  },
59
  "nsubj": {
60
- "p": 0.8202676864,
61
- "r": 0.8234165067,
62
- "f": 0.8218390805
63
  },
64
  "nmod": {
65
- "p": 0.8793532338,
66
- "r": 0.8269005848,
67
- "f": 0.8523206751
68
  },
69
  "root": {
70
- "p": 0.9741035857,
71
- "r": 0.9644970414,
72
- "f": 0.9692765114
73
  },
74
  "aux": {
75
- "p": 0.9796296296,
76
- "r": 0.982358403,
77
- "f": 0.9809921187
78
  },
79
  "advcl": {
80
- "p": 0.6893424036,
81
- "r": 0.6831460674,
82
- "f": 0.6862302483
83
  },
84
  "mark": {
85
- "p": 0.9755600815,
86
- "r": 0.958,
87
- "f": 0.9667003027
88
  },
89
  "fixed": {
90
- "p": 0.9569892473,
91
  "r": 0.9709090909,
92
- "f": 0.963898917
93
  },
94
  "acl": {
95
- "p": 0.8239130435,
96
- "r": 0.832967033,
97
- "f": 0.8284153005
98
  },
99
  "obj": {
100
- "p": 0.9507692308,
101
- "r": 0.9335347432,
102
- "f": 0.9420731707
103
  },
104
  "nummod": {
105
- "p": 0.9934640523,
106
- "r": 0.899408284,
107
- "f": 0.9440993789
108
  },
109
  "advmod": {
110
- "p": 0.6771653543,
111
- "r": 0.6142857143,
112
- "f": 0.6441947566
113
  },
114
  "amod": {
115
- "p": 0.8709677419,
116
- "r": 0.7297297297,
117
- "f": 0.7941176471
118
  },
119
  "cop": {
120
- "p": 0.9464285714,
121
- "r": 0.9244186047,
122
- "f": 0.9352941176
123
  },
124
  "ccomp": {
125
- "p": 0.8571428571,
126
- "r": 0.8181818182,
127
- "f": 0.8372093023
128
- },
129
- "det": {
130
  "p": 1.0,
131
- "r": 0.9811320755,
132
- "f": 0.9904761905
133
  },
134
  "dep": {
135
- "p": 0.0714285714,
136
- "r": 0.1428571429,
137
- "f": 0.0952380952
138
  },
139
  "csubj": {
140
- "p": 0.6363636364,
141
- "r": 0.5833333333,
142
- "f": 0.6086956522
 
 
 
 
 
143
  }
144
  },
145
- "tag_acc": 0.9715755942,
146
- "lemma_acc": 0.9659109444,
147
- "ents_p": 0.6767371601,
148
- "ents_r": 0.5635220126,
149
- "ents_f": 0.6149622512,
150
  "ents_per_type": {
151
  "DATE": {
152
- "p": 0.9285714286,
153
  "r": 0.9541284404,
154
- "f": 0.9411764706
 
 
 
 
 
155
  },
156
  "ORG": {
157
- "p": 0.5046728972,
158
- "r": 0.3941605839,
159
- "f": 0.4426229508
160
  },
161
- "GPE": {
162
- "p": 0.6296296296,
163
- "r": 0.5425531915,
164
- "f": 0.5828571429
165
  },
166
- "PRODUCT": {
167
- "p": 0.4285714286,
168
- "r": 0.2142857143,
169
- "f": 0.2857142857
170
  },
171
  "TIME": {
172
- "p": 0.5714285714,
173
  "r": 1.0,
174
- "f": 0.7272727273
175
- },
176
- "QUANTITY": {
177
- "p": 0.7972972973,
178
- "r": 0.8939393939,
179
- "f": 0.8428571429
180
  },
181
  "PERSON": {
182
- "p": 0.6346153846,
183
- "r": 0.4748201439,
184
- "f": 0.5432098765
185
  },
186
  "NORP": {
187
  "p": 0.6666666667,
188
  "r": 0.5625,
189
  "f": 0.6101694915
190
  },
 
 
 
 
 
191
  "TITLE_AFFIX": {
192
- "p": 0.7142857143,
193
- "r": 0.5,
194
- "f": 0.5882352941
195
  },
196
- "ORDINAL": {
197
- "p": 0.5909090909,
198
- "r": 0.5909090909,
199
- "f": 0.5909090909
200
  },
201
  "WORK_OF_ART": {
202
- "p": 0.7692307692,
203
  "r": 0.5882352941,
204
- "f": 0.6666666667
205
- },
206
- "EVENT": {
207
- "p": 0.6,
208
- "r": 0.4615384615,
209
- "f": 0.5217391304
210
  },
211
  "PERCENT": {
212
- "p": 1.0,
213
  "r": 0.2857142857,
214
- "f": 0.4444444444
 
 
 
 
 
215
  },
216
  "CARDINAL": {
217
  "p": 0.0,
@@ -219,14 +224,9 @@
219
  "f": 0.0
220
  },
221
  "LOC": {
222
- "p": 0.7,
223
- "r": 0.7,
224
- "f": 0.7
225
- },
226
- "FAC": {
227
- "p": 0.44,
228
- "r": 0.2972972973,
229
- "f": 0.3548387097
230
  },
231
  "MOVEMENT": {
232
  "p": 0.0,
@@ -234,14 +234,14 @@
234
  "f": 0.0
235
  },
236
  "LAW": {
237
- "p": 0.0,
238
- "r": 0.0,
239
- "f": 0.0
240
  },
241
  "MONEY": {
242
- "p": 1.0,
243
  "r": 1.0,
244
- "f": 1.0
245
  },
246
  "LANGUAGE": {
247
  "p": 1.0,
@@ -249,5 +249,5 @@
249
  "f": 1.0
250
  }
251
  },
252
- "speed": 4844.6752174676
253
  }
 
3
  "token_p": 0.9764591282,
4
  "token_r": 0.9790021974,
5
  "token_f": 0.9777290092,
6
+ "pos_acc": 0.9616721888,
7
+ "morph_acc": 0.0,
8
  "morph_micro_p": 0.3401360544,
9
  "morph_micro_r": 0.9803921569,
10
  "morph_micro_f": 0.5050505051,
 
25
  "f": 0.0
26
  }
27
  },
28
+ "sents_p": 0.97265625,
29
+ "sents_r": 0.9822485207,
30
+ "sents_f": 0.9774288518,
31
+ "dep_uas": 0.9207149611,
32
+ "dep_las": 0.9061220818,
33
  "dep_las_per_type": {
34
  "cc": {
35
+ "p": 0.8260869565,
36
+ "r": 0.7916666667,
37
+ "f": 0.8085106383
38
  },
39
  "compound": {
40
+ "p": 0.9384079024,
41
  "r": 0.9103720406,
42
+ "f": 0.9241773963
43
  },
44
  "obl": {
45
+ "p": 0.813283208,
46
+ "r": 0.8102372035,
47
+ "f": 0.8117573483
48
  },
49
  "case": {
50
+ "p": 0.9881226054,
51
+ "r": 0.9798632219,
52
+ "f": 0.9839755818
53
  },
54
  "dislocated": {
55
+ "p": 0.5,
56
+ "r": 0.3846153846,
57
+ "f": 0.4347826087
58
  },
59
  "nsubj": {
60
+ "p": 0.8188824663,
61
+ "r": 0.8157389635,
62
+ "f": 0.8173076923
63
  },
64
  "nmod": {
65
+ "p": 0.8879093199,
66
+ "r": 0.8245614035,
67
+ "f": 0.855063675
68
  },
69
  "root": {
70
+ "p": 0.9643564356,
71
+ "r": 0.9605522682,
72
+ "f": 0.9624505929
73
  },
74
  "aux": {
75
+ "p": 0.9788213628,
76
+ "r": 0.9870009285,
77
+ "f": 0.9828941285
78
  },
79
  "advcl": {
80
+ "p": 0.6824324324,
81
+ "r": 0.6808988764,
82
+ "f": 0.6816647919
83
  },
84
  "mark": {
85
+ "p": 0.9696969697,
86
+ "r": 0.96,
87
+ "f": 0.9648241206
88
  },
89
  "fixed": {
90
+ "p": 0.963898917,
91
  "r": 0.9709090909,
92
+ "f": 0.9673913043
93
  },
94
  "acl": {
95
+ "p": 0.8252212389,
96
+ "r": 0.8197802198,
97
+ "f": 0.822491731
98
  },
99
  "obj": {
100
+ "p": 0.9446153846,
101
+ "r": 0.9274924471,
102
+ "f": 0.9359756098
103
  },
104
  "nummod": {
105
+ "p": 0.9805194805,
106
+ "r": 0.8934911243,
107
+ "f": 0.9349845201
108
  },
109
  "advmod": {
110
+ "p": 0.6788321168,
111
+ "r": 0.6642857143,
112
+ "f": 0.6714801444
113
  },
114
  "amod": {
115
+ "p": 0.8125,
116
+ "r": 0.7027027027,
117
+ "f": 0.7536231884
118
  },
119
  "cop": {
120
+ "p": 0.9756097561,
121
+ "r": 0.9302325581,
122
+ "f": 0.9523809524
123
  },
124
  "ccomp": {
 
 
 
 
 
125
  "p": 1.0,
126
+ "r": 0.8636363636,
127
+ "f": 0.9268292683
128
  },
129
  "dep": {
130
+ "p": 0.0,
131
+ "r": 0.0,
132
+ "f": 0.0
133
  },
134
  "csubj": {
135
+ "p": 0.5333333333,
136
+ "r": 0.6666666667,
137
+ "f": 0.5925925926
138
+ },
139
+ "det": {
140
+ "p": 1.0,
141
+ "r": 0.9811320755,
142
+ "f": 0.9904761905
143
  }
144
  },
145
+ "tag_acc": 0.9712488769,
146
+ "lemma_acc": 0.965013864,
147
+ "ents_p": 0.6996904025,
148
+ "ents_r": 0.5685534591,
149
+ "ents_f": 0.6273421235,
150
  "ents_per_type": {
151
  "DATE": {
152
+ "p": 0.9454545455,
153
  "r": 0.9541284404,
154
+ "f": 0.9497716895
155
+ },
156
+ "PRODUCT": {
157
+ "p": 0.4814814815,
158
+ "r": 0.3095238095,
159
+ "f": 0.3768115942
160
  },
161
  "ORG": {
162
+ "p": 0.5148514851,
163
+ "r": 0.3795620438,
164
+ "f": 0.4369747899
165
  },
166
+ "QUANTITY": {
167
+ "p": 0.8243243243,
168
+ "r": 0.9242424242,
169
+ "f": 0.8714285714
170
  },
171
+ "GPE": {
172
+ "p": 0.6179775281,
173
+ "r": 0.585106383,
174
+ "f": 0.6010928962
175
  },
176
  "TIME": {
177
+ "p": 0.6666666667,
178
  "r": 1.0,
179
+ "f": 0.8
 
 
 
 
 
180
  },
181
  "PERSON": {
182
+ "p": 0.6632653061,
183
+ "r": 0.4676258993,
184
+ "f": 0.5485232068
185
  },
186
  "NORP": {
187
  "p": 0.6666666667,
188
  "r": 0.5625,
189
  "f": 0.6101694915
190
  },
191
+ "ORDINAL": {
192
+ "p": 0.5185185185,
193
+ "r": 0.6363636364,
194
+ "f": 0.5714285714
195
+ },
196
  "TITLE_AFFIX": {
197
+ "p": 0.6875,
198
+ "r": 0.3666666667,
199
+ "f": 0.4782608696
200
  },
201
+ "FAC": {
202
+ "p": 0.5882352941,
203
+ "r": 0.2702702703,
204
+ "f": 0.3703703704
205
  },
206
  "WORK_OF_ART": {
207
+ "p": 0.9090909091,
208
  "r": 0.5882352941,
209
+ "f": 0.7142857143
 
 
 
 
 
210
  },
211
  "PERCENT": {
212
+ "p": 0.6666666667,
213
  "r": 0.2857142857,
214
+ "f": 0.4
215
+ },
216
+ "EVENT": {
217
+ "p": 0.8125,
218
+ "r": 0.5,
219
+ "f": 0.619047619
220
  },
221
  "CARDINAL": {
222
  "p": 0.0,
 
224
  "f": 0.0
225
  },
226
  "LOC": {
227
+ "p": 0.8571428571,
228
+ "r": 0.6,
229
+ "f": 0.7058823529
 
 
 
 
 
230
  },
231
  "MOVEMENT": {
232
  "p": 0.0,
 
234
  "f": 0.0
235
  },
236
  "LAW": {
237
+ "p": 1.0,
238
+ "r": 0.3333333333,
239
+ "f": 0.5
240
  },
241
  "MONEY": {
242
+ "p": 0.875,
243
  "r": 1.0,
244
+ "f": 0.9333333333
245
  },
246
  "LANGUAGE": {
247
  "p": 1.0,
 
249
  "f": 1.0
250
  }
251
  },
252
+ "speed": 10590.4387625828
253
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
 
config.cfg CHANGED
@@ -35,8 +35,9 @@ overwrite = true
35
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
36
 
37
  [components.morphologizer.model]
38
- @architectures = "spacy.Tagger.v1"
39
  nO = null
 
40
 
41
  [components.morphologizer.model.tok2vec]
42
  @architectures = "spacy.Tok2VecListener.v1"
@@ -66,7 +67,7 @@ nO = null
66
  @architectures = "spacy.MultiHashEmbed.v2"
67
  width = 96
68
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
69
- rows = [5000,2500,2500,2500]
70
  include_static_vectors = false
71
 
72
  [components.ner.model.tok2vec.encode]
@@ -104,8 +105,9 @@ overwrite = false
104
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
105
 
106
  [components.senter.model]
107
- @architectures = "spacy.Tagger.v1"
108
  nO = null
 
109
 
110
  [components.senter.model.tok2vec]
111
  @architectures = "spacy.Tok2Vec.v2"
@@ -134,7 +136,7 @@ factory = "tok2vec"
134
  @architectures = "spacy.MultiHashEmbed.v2"
135
  width = ${components.tok2vec.model.encode:width}
136
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
137
- rows = [5000,2500,2500,2500]
138
  include_static_vectors = false
139
 
140
  [components.tok2vec.model.encode]
@@ -171,7 +173,7 @@ dropout = 0.1
171
  accumulate_gradient = 1
172
  patience = 5000
173
  max_epochs = 0
174
- max_steps = 0
175
  eval_frequency = 1000
176
  frozen_components = []
177
  before_to_disk = null
 
35
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
36
 
37
  [components.morphologizer.model]
38
+ @architectures = "spacy.Tagger.v2"
39
  nO = null
40
+ normalize = false
41
 
42
  [components.morphologizer.model.tok2vec]
43
  @architectures = "spacy.Tok2VecListener.v1"
 
67
  @architectures = "spacy.MultiHashEmbed.v2"
68
  width = 96
69
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
70
+ rows = [5000,1000,2500,2500]
71
  include_static_vectors = false
72
 
73
  [components.ner.model.tok2vec.encode]
 
105
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
106
 
107
  [components.senter.model]
108
+ @architectures = "spacy.Tagger.v2"
109
  nO = null
110
+ normalize = false
111
 
112
  [components.senter.model.tok2vec]
113
  @architectures = "spacy.Tok2Vec.v2"
 
136
  @architectures = "spacy.MultiHashEmbed.v2"
137
  width = ${components.tok2vec.model.encode:width}
138
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
139
+ rows = [5000,1000,2500,2500]
140
  include_static_vectors = false
141
 
142
  [components.tok2vec.model.encode]
 
173
  accumulate_gradient = 1
174
  patience = 5000
175
  max_epochs = 0
176
+ max_steps = 100000
177
  eval_frequency = 1000
178
  frozen_components = []
179
  before_to_disk = null
ja_core_news_sm-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:be790d131bfe78f83dfdf9205bdefd603a8438a8b3f125e85dcd3a309f605f53
3
- size 13032813
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abe7de8a5653288f79100243b208be37fc0ade5cd07e6bfdfc8b784424cfcf06
3
+ size 11965420
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"ja",
3
  "name":"core_news_sm",
4
- "version":"3.2.0",
5
  "description":"Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.2.0,<3.3.0",
11
- "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -65,10 +65,6 @@
65
  "obl",
66
  "punct"
67
  ],
68
- "senter":[
69
- "I",
70
- "S"
71
- ],
72
  "attribute_ruler":[
73
 
74
  ],
@@ -120,8 +116,8 @@
120
  "token_p":0.9764591282,
121
  "token_r":0.9790021974,
122
  "token_f":0.9777290092,
123
- "pos_acc":0.9624933535,
124
- "morph_acc":0.0040005162,
125
  "morph_micro_p":0.3401360544,
126
  "morph_micro_r":0.9803921569,
127
  "morph_micro_f":0.5050505051,
@@ -142,193 +138,198 @@
142
  "f":0.0
143
  }
144
  },
145
- "sents_p":0.9823529412,
146
- "sents_r":0.9881656805,
147
- "sents_f":0.9852507375,
148
- "dep_uas":0.921981982,
149
- "dep_las":0.9069390694,
150
  "dep_las_per_type":{
151
  "cc":{
152
- "p":0.8163265306,
153
- "r":0.8333333333,
154
- "f":0.824742268
155
  },
156
  "compound":{
157
- "p":0.9411421911,
158
  "r":0.9103720406,
159
- "f":0.9255014327
160
  },
161
  "obl":{
162
- "p":0.8113207547,
163
- "r":0.8052434457,
164
- "f":0.8082706767
165
  },
166
  "case":{
167
- "p":0.9892638037,
168
- "r":0.9802431611,
169
- "f":0.9847328244
170
  },
171
  "dislocated":{
172
- "p":0.7,
173
- "r":0.5384615385,
174
- "f":0.6086956522
175
  },
176
  "nsubj":{
177
- "p":0.8202676864,
178
- "r":0.8234165067,
179
- "f":0.8218390805
180
  },
181
  "nmod":{
182
- "p":0.8793532338,
183
- "r":0.8269005848,
184
- "f":0.8523206751
185
  },
186
  "root":{
187
- "p":0.9741035857,
188
- "r":0.9644970414,
189
- "f":0.9692765114
190
  },
191
  "aux":{
192
- "p":0.9796296296,
193
- "r":0.982358403,
194
- "f":0.9809921187
195
  },
196
  "advcl":{
197
- "p":0.6893424036,
198
- "r":0.6831460674,
199
- "f":0.6862302483
200
  },
201
  "mark":{
202
- "p":0.9755600815,
203
- "r":0.958,
204
- "f":0.9667003027
205
  },
206
  "fixed":{
207
- "p":0.9569892473,
208
  "r":0.9709090909,
209
- "f":0.963898917
210
  },
211
  "acl":{
212
- "p":0.8239130435,
213
- "r":0.832967033,
214
- "f":0.8284153005
215
  },
216
  "obj":{
217
- "p":0.9507692308,
218
- "r":0.9335347432,
219
- "f":0.9420731707
220
  },
221
  "nummod":{
222
- "p":0.9934640523,
223
- "r":0.899408284,
224
- "f":0.9440993789
225
  },
226
  "advmod":{
227
- "p":0.6771653543,
228
- "r":0.6142857143,
229
- "f":0.6441947566
230
  },
231
  "amod":{
232
- "p":0.8709677419,
233
- "r":0.7297297297,
234
- "f":0.7941176471
235
  },
236
  "cop":{
237
- "p":0.9464285714,
238
- "r":0.9244186047,
239
- "f":0.9352941176
240
  },
241
  "ccomp":{
242
- "p":0.8571428571,
243
- "r":0.8181818182,
244
- "f":0.8372093023
245
- },
246
- "det":{
247
  "p":1.0,
248
- "r":0.9811320755,
249
- "f":0.9904761905
250
  },
251
  "dep":{
252
- "p":0.0714285714,
253
- "r":0.1428571429,
254
- "f":0.0952380952
255
  },
256
  "csubj":{
257
- "p":0.6363636364,
258
- "r":0.5833333333,
259
- "f":0.6086956522
 
 
 
 
 
260
  }
261
  },
262
- "tag_acc":0.9715755942,
263
- "lemma_acc":0.9659109444,
264
- "ents_p":0.6767371601,
265
- "ents_r":0.5635220126,
266
- "ents_f":0.6149622512,
267
  "ents_per_type":{
268
  "DATE":{
269
- "p":0.9285714286,
270
  "r":0.9541284404,
271
- "f":0.9411764706
 
 
 
 
 
272
  },
273
  "ORG":{
274
- "p":0.5046728972,
275
- "r":0.3941605839,
276
- "f":0.4426229508
277
  },
278
- "GPE":{
279
- "p":0.6296296296,
280
- "r":0.5425531915,
281
- "f":0.5828571429
282
  },
283
- "PRODUCT":{
284
- "p":0.4285714286,
285
- "r":0.2142857143,
286
- "f":0.2857142857
287
  },
288
  "TIME":{
289
- "p":0.5714285714,
290
  "r":1.0,
291
- "f":0.7272727273
292
- },
293
- "QUANTITY":{
294
- "p":0.7972972973,
295
- "r":0.8939393939,
296
- "f":0.8428571429
297
  },
298
  "PERSON":{
299
- "p":0.6346153846,
300
- "r":0.4748201439,
301
- "f":0.5432098765
302
  },
303
  "NORP":{
304
  "p":0.6666666667,
305
  "r":0.5625,
306
  "f":0.6101694915
307
  },
 
 
 
 
 
308
  "TITLE_AFFIX":{
309
- "p":0.7142857143,
310
- "r":0.5,
311
- "f":0.5882352941
312
  },
313
- "ORDINAL":{
314
- "p":0.5909090909,
315
- "r":0.5909090909,
316
- "f":0.5909090909
317
  },
318
  "WORK_OF_ART":{
319
- "p":0.7692307692,
320
  "r":0.5882352941,
321
- "f":0.6666666667
322
- },
323
- "EVENT":{
324
- "p":0.6,
325
- "r":0.4615384615,
326
- "f":0.5217391304
327
  },
328
  "PERCENT":{
329
- "p":1.0,
330
  "r":0.2857142857,
331
- "f":0.4444444444
 
 
 
 
 
332
  },
333
  "CARDINAL":{
334
  "p":0.0,
@@ -336,14 +337,9 @@
336
  "f":0.0
337
  },
338
  "LOC":{
339
- "p":0.7,
340
- "r":0.7,
341
- "f":0.7
342
- },
343
- "FAC":{
344
- "p":0.44,
345
- "r":0.2972972973,
346
- "f":0.3548387097
347
  },
348
  "MOVEMENT":{
349
  "p":0.0,
@@ -351,14 +347,14 @@
351
  "f":0.0
352
  },
353
  "LAW":{
354
- "p":0.0,
355
- "r":0.0,
356
- "f":0.0
357
  },
358
  "MONEY":{
359
- "p":1.0,
360
  "r":1.0,
361
- "f":1.0
362
  },
363
  "LANGUAGE":{
364
  "p":1.0,
@@ -366,7 +362,7 @@
366
  "f":1.0
367
  }
368
  },
369
- "speed":4844.6752174676
370
  },
371
  "sources":[
372
  {
@@ -383,7 +379,7 @@
383
  }
384
  ],
385
  "requirements":[
386
- "sudachipy>=0.4.9",
387
- "sudachidict-core>=20200330"
388
  ]
389
  }
 
1
  {
2
  "lang":"ja",
3
  "name":"core_news_sm",
4
+ "version":"3.3.0",
5
  "description":"Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.3.0.dev0,<3.4.0",
11
+ "spacy_git_version":"849bef2de",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
 
65
  "obl",
66
  "punct"
67
  ],
 
 
 
 
68
  "attribute_ruler":[
69
 
70
  ],
 
116
  "token_p":0.9764591282,
117
  "token_r":0.9790021974,
118
  "token_f":0.9777290092,
119
+ "pos_acc":0.9616721888,
120
+ "morph_acc":0.0,
121
  "morph_micro_p":0.3401360544,
122
  "morph_micro_r":0.9803921569,
123
  "morph_micro_f":0.5050505051,
 
138
  "f":0.0
139
  }
140
  },
141
+ "sents_p":0.97265625,
142
+ "sents_r":0.9822485207,
143
+ "sents_f":0.9774288518,
144
+ "dep_uas":0.9207149611,
145
+ "dep_las":0.9061220818,
146
  "dep_las_per_type":{
147
  "cc":{
148
+ "p":0.8260869565,
149
+ "r":0.7916666667,
150
+ "f":0.8085106383
151
  },
152
  "compound":{
153
+ "p":0.9384079024,
154
  "r":0.9103720406,
155
+ "f":0.9241773963
156
  },
157
  "obl":{
158
+ "p":0.813283208,
159
+ "r":0.8102372035,
160
+ "f":0.8117573483
161
  },
162
  "case":{
163
+ "p":0.9881226054,
164
+ "r":0.9798632219,
165
+ "f":0.9839755818
166
  },
167
  "dislocated":{
168
+ "p":0.5,
169
+ "r":0.3846153846,
170
+ "f":0.4347826087
171
  },
172
  "nsubj":{
173
+ "p":0.8188824663,
174
+ "r":0.8157389635,
175
+ "f":0.8173076923
176
  },
177
  "nmod":{
178
+ "p":0.8879093199,
179
+ "r":0.8245614035,
180
+ "f":0.855063675
181
  },
182
  "root":{
183
+ "p":0.9643564356,
184
+ "r":0.9605522682,
185
+ "f":0.9624505929
186
  },
187
  "aux":{
188
+ "p":0.9788213628,
189
+ "r":0.9870009285,
190
+ "f":0.9828941285
191
  },
192
  "advcl":{
193
+ "p":0.6824324324,
194
+ "r":0.6808988764,
195
+ "f":0.6816647919
196
  },
197
  "mark":{
198
+ "p":0.9696969697,
199
+ "r":0.96,
200
+ "f":0.9648241206
201
  },
202
  "fixed":{
203
+ "p":0.963898917,
204
  "r":0.9709090909,
205
+ "f":0.9673913043
206
  },
207
  "acl":{
208
+ "p":0.8252212389,
209
+ "r":0.8197802198,
210
+ "f":0.822491731
211
  },
212
  "obj":{
213
+ "p":0.9446153846,
214
+ "r":0.9274924471,
215
+ "f":0.9359756098
216
  },
217
  "nummod":{
218
+ "p":0.9805194805,
219
+ "r":0.8934911243,
220
+ "f":0.9349845201
221
  },
222
  "advmod":{
223
+ "p":0.6788321168,
224
+ "r":0.6642857143,
225
+ "f":0.6714801444
226
  },
227
  "amod":{
228
+ "p":0.8125,
229
+ "r":0.7027027027,
230
+ "f":0.7536231884
231
  },
232
  "cop":{
233
+ "p":0.9756097561,
234
+ "r":0.9302325581,
235
+ "f":0.9523809524
236
  },
237
  "ccomp":{
 
 
 
 
 
238
  "p":1.0,
239
+ "r":0.8636363636,
240
+ "f":0.9268292683
241
  },
242
  "dep":{
243
+ "p":0.0,
244
+ "r":0.0,
245
+ "f":0.0
246
  },
247
  "csubj":{
248
+ "p":0.5333333333,
249
+ "r":0.6666666667,
250
+ "f":0.5925925926
251
+ },
252
+ "det":{
253
+ "p":1.0,
254
+ "r":0.9811320755,
255
+ "f":0.9904761905
256
  }
257
  },
258
+ "tag_acc":0.9712488769,
259
+ "lemma_acc":0.965013864,
260
+ "ents_p":0.6996904025,
261
+ "ents_r":0.5685534591,
262
+ "ents_f":0.6273421235,
263
  "ents_per_type":{
264
  "DATE":{
265
+ "p":0.9454545455,
266
  "r":0.9541284404,
267
+ "f":0.9497716895
268
+ },
269
+ "PRODUCT":{
270
+ "p":0.4814814815,
271
+ "r":0.3095238095,
272
+ "f":0.3768115942
273
  },
274
  "ORG":{
275
+ "p":0.5148514851,
276
+ "r":0.3795620438,
277
+ "f":0.4369747899
278
  },
279
+ "QUANTITY":{
280
+ "p":0.8243243243,
281
+ "r":0.9242424242,
282
+ "f":0.8714285714
283
  },
284
+ "GPE":{
285
+ "p":0.6179775281,
286
+ "r":0.585106383,
287
+ "f":0.6010928962
288
  },
289
  "TIME":{
290
+ "p":0.6666666667,
291
  "r":1.0,
292
+ "f":0.8
 
 
 
 
 
293
  },
294
  "PERSON":{
295
+ "p":0.6632653061,
296
+ "r":0.4676258993,
297
+ "f":0.5485232068
298
  },
299
  "NORP":{
300
  "p":0.6666666667,
301
  "r":0.5625,
302
  "f":0.6101694915
303
  },
304
+ "ORDINAL":{
305
+ "p":0.5185185185,
306
+ "r":0.6363636364,
307
+ "f":0.5714285714
308
+ },
309
  "TITLE_AFFIX":{
310
+ "p":0.6875,
311
+ "r":0.3666666667,
312
+ "f":0.4782608696
313
  },
314
+ "FAC":{
315
+ "p":0.5882352941,
316
+ "r":0.2702702703,
317
+ "f":0.3703703704
318
  },
319
  "WORK_OF_ART":{
320
+ "p":0.9090909091,
321
  "r":0.5882352941,
322
+ "f":0.7142857143
 
 
 
 
 
323
  },
324
  "PERCENT":{
325
+ "p":0.6666666667,
326
  "r":0.2857142857,
327
+ "f":0.4
328
+ },
329
+ "EVENT":{
330
+ "p":0.8125,
331
+ "r":0.5,
332
+ "f":0.619047619
333
  },
334
  "CARDINAL":{
335
  "p":0.0,
 
337
  "f":0.0
338
  },
339
  "LOC":{
340
+ "p":0.8571428571,
341
+ "r":0.6,
342
+ "f":0.7058823529
 
 
 
 
 
343
  },
344
  "MOVEMENT":{
345
  "p":0.0,
 
347
  "f":0.0
348
  },
349
  "LAW":{
350
+ "p":1.0,
351
+ "r":0.3333333333,
352
+ "f":0.5
353
  },
354
  "MONEY":{
355
+ "p":0.875,
356
  "r":1.0,
357
+ "f":0.9333333333
358
  },
359
  "LANGUAGE":{
360
  "p":1.0,
 
362
  "f":1.0
363
  }
364
  },
365
+ "speed":10590.4387625828
366
  },
367
  "sources":[
368
  {
 
379
  }
380
  ],
381
  "requirements":[
382
+ "sudachipy>=0.5.2,!=0.6.1",
383
+ "sudachidict-core>=20211220"
384
  ]
385
  }
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e4342197134b78e4331bceae7c61472c399e737fbd929670c5e28d29c4c9850f
3
- size 7749
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54cc5d13db9e81893cb14d34cce3c20877984a99ace9c89b8339861e1e5daba6
3
+ size 7801
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e5fb7bacd88de03c8e850b802434ac5a60be29b72fc304a18be3a444398305fe
3
- size 6734761
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:748916f006122c16ac8203ed42ff9a480c9efb7e7a533ab8d0ca7f21e1df8146
3
+ size 6158761
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5fd0fed6127d0f802fd7d5821b6f8ce1ccdc81081be70158e6e3ff3174069465
3
  size 299888
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:271dd83bf4634cda5578e228b20ae8ba0fcb62aa2cdf1a972435c4c9cb13591a
3
  size 299888
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�~{"0":{"":75051},"1":{"":81581},"2":{"compound":22178,"nmod":11296,"obl":10522,"nsubj":6649,"acl":6185,"advcl":5956,"obj":4364,"nummod":2247,"advmod":1841,"punct":1169,"det":822,"cc":699,"amod":357,"ccomp":335,"dislocated":233,"csubj":139,"dep":0},"3":{"case":35390,"punct":15051,"aux":14506,"fixed":7377,"mark":6390,"cop":2079,"compound":542,"advcl":148,"dep":56},"4":{"ROOT":6810}}�cfg��neg_key�
 
1
+ ��moves�~{"0":{"":77992},"1":{"":83293},"2":{"compound":23506,"nmod":11446,"obl":11030,"nsubj":6884,"advcl":6063,"acl":6020,"obj":4629,"nummod":2487,"advmod":1922,"punct":1321,"det":830,"cc":726,"amod":372,"ccomp":325,"dislocated":235,"csubj":133,"dep":0},"3":{"case":35913,"punct":15455,"aux":14940,"fixed":7391,"mark":6644,"cop":2100,"compound":598,"advcl":152,"dep":58},"4":{"ROOT":7050}}�cfg��neg_key�
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d8c892c86eeb24822aa9fb42c078a7fadda210cb50c0d36f621105028ad090d3
3
- size 190395
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4041306bc68fa4038c05676bd86c4efb4f54f8527eecdee0848625398d79c09
3
+ size 190447
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5b10a7c2a9e89642c737c9760507e9ed0224cc08a7c3ed34880cd25112114831
3
- size 6585091
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12c2a49988bde947980a8fca6448bee9ed02554273da176f6e869cea2f69abdb
3
+ size 6009091
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9ddb6fa8049cb20badee921092f615ca507a8b27a731a85d02502ebd8a765608
3
- size 1559195
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:056e9c81ef594430afdb87d434bb9926610548e24145c2350a413dde52ee7865
3
+ size 1603226