osanseviero HF staff commited on
Commit
16a00d2
1 Parent(s): 2f459d5

Update spaCy pipeline

Browse files
LICENSES_SOURCES CHANGED
@@ -1,4 +1,4 @@
1
- # UD Japanese GSD v2.6
2
 
3
  * Author: Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel
4
  * URL: https://github.com/UniversalDependencies/UD_Japanese-GSD
@@ -438,7 +438,7 @@ Creative Commons may be contacted at creativecommons.org.
438
 
439
 
440
 
441
- # UD Japanese GSD v2.6 NER
442
 
443
  * Author: Megagon Labs Tokyo
444
  * URL: https://github.com/megagonlabs/UD_Japanese-GSD
 
1
+ # UD Japanese GSD v2.8
2
 
3
  * Author: Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel
4
  * URL: https://github.com/UniversalDependencies/UD_Japanese-GSD
 
438
 
439
 
440
 
441
+ # UD Japanese GSD v2.8 NER
442
 
443
  * Author: Megagon Labs Tokyo
444
  * URL: https://github.com/megagonlabs/UD_Japanese-GSD
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - token-classification
5
  language:
6
  - ja
7
- license: CC-BY-SA-4.0
8
  model-index:
9
  - name: ja_core_news_lg
10
  results:
@@ -14,61 +14,61 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.760989011
18
  - name: NER Recall
19
  type: recall
20
- value: 0.7075351213
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.7332892124
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
  - name: POS Accuracy
29
  type: accuracy
30
- value: 0.9721899386
31
  - task:
32
  name: SENTER
33
  type: token-classification
34
  metrics:
35
  - name: SENTER Precision
36
  type: precision
37
- value: 0.9860557769
38
  - name: SENTER Recall
39
  type: recall
40
- value: 0.9880239521
41
  - name: SENTER F Score
42
  type: f_score
43
- value: 0.9870388833
44
  - task:
45
  name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
  - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.9181002928
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
  - name: Labeled Dependencies Accuracy
56
  type: accuracy
57
- value: 0.9181002928
58
  ---
59
  ### Details: https://spacy.io/models/ja#ja_core_news_lg
60
 
61
- Japanese pipeline optimized for CPU. Components: tok2vec, parser, senter, ner, attribute_ruler.
62
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `ja_core_news_lg` |
66
- | **Version** | `3.1.0` |
67
- | **spaCy** | `>=3.1.0,<3.2.0` |
68
- | **Default Pipeline** | `tok2vec`, `parser`, `attribute_ruler`, `ner` |
69
- | **Components** | `tok2vec`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
  | **Vectors** | 480443 keys, 480443 unique vectors (300 dimensions) |
71
- | **Sources** | [UD Japanese GSD v2.6](https://github.com/UniversalDependencies/UD_Japanese-GSD) (Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel)<br />[UD Japanese GSD v2.6 NER](https://github.com/megagonlabs/UD_Japanese-GSD) (Megagon Labs Tokyo)<br />[chiVe: Japanese Word Embedding with Sudachi & NWJC (chive-1.1-mc90-500k)](https://github.com/WorksApplications/chiVe) (Works Applications) |
72
  | **License** | `CC BY-SA 4.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
@@ -76,10 +76,11 @@ Japanese pipeline optimized for CPU. Components: tok2vec, parser, senter, ner, a
76
 
77
  <details>
78
 
79
- <summary>View label scheme (47 labels for 3 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
 
83
  | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct` |
84
  | **`senter`** | `I`, `S` |
85
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART` |
@@ -91,14 +92,21 @@ Japanese pipeline optimized for CPU. Components: tok2vec, parser, senter, ner, a
91
  | Type | Score |
92
  | --- | --- |
93
  | `TOKEN_ACC` | 99.69 |
94
- | `TAG_ACC` | 97.22 |
95
- | `POS_ACC` | 96.40 |
96
- | `MORPH_ACC` | 0.00 |
97
- | `DEP_UAS` | 91.81 |
98
- | `DEP_LAS` | 89.98 |
99
- | `ENTS_P` | 76.10 |
100
- | `ENTS_R` | 70.75 |
101
- | `ENTS_F` | 73.33 |
102
- | `SENTS_P` | 98.61 |
103
- | `SENTS_R` | 98.80 |
104
- | `SENTS_F` | 98.70 |
 
 
 
 
 
 
 
 
4
  - token-classification
5
  language:
6
  - ja
7
+ license: cc-by-sa-4.0
8
  model-index:
9
  - name: ja_core_news_lg
10
  results:
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.7402422611
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.6918238994
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.7152145644
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
  - name: POS Accuracy
29
  type: accuracy
30
+ value: 0.9715755942
31
  - task:
32
  name: SENTER
33
  type: token-classification
34
  metrics:
35
  - name: SENTER Precision
36
  type: precision
37
+ value: 0.9862204724
38
  - name: SENTER Recall
39
  type: recall
40
+ value: 0.9881656805
41
  - name: SENTER F Score
42
  type: f_score
43
+ value: 0.9871921182
44
  - task:
45
  name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
  - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
+ value: 0.9214150689
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
  - name: Labeled Dependencies Accuracy
56
  type: accuracy
57
+ value: 0.9214150689
58
  ---
59
  ### Details: https://spacy.io/models/ja#ja_core_news_lg
60
 
61
+ Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler.
62
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `ja_core_news_lg` |
66
+ | **Version** | `3.2.0` |
67
+ | **spaCy** | `>=3.2.0,<3.3.0` |
68
+ | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `ner` |
69
+ | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
  | **Vectors** | 480443 keys, 480443 unique vectors (300 dimensions) |
71
+ | **Sources** | [UD Japanese GSD v2.8](https://github.com/UniversalDependencies/UD_Japanese-GSD) (Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel)<br />[UD Japanese GSD v2.8 NER](https://github.com/megagonlabs/UD_Japanese-GSD) (Megagon Labs Tokyo)<br />[chiVe: Japanese Word Embedding with Sudachi & NWJC (chive-1.1-mc90-500k)](https://github.com/WorksApplications/chiVe) (Works Applications) |
72
  | **License** | `CC BY-SA 4.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
 
76
 
77
  <details>
78
 
79
+ <summary>View label scheme (66 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
+ | **`morphologizer`** | `POS=NOUN`, `POS=ADP`, `POS=VERB`, `POS=SCONJ`, `POS=AUX`, `POS=PUNCT`, `POS=PART`, `POS=DET`, `POS=NUM`, `POS=ADV`, `POS=PRON`, `POS=ADJ`, `POS=PROPN`, `POS=CCONJ`, `POS=SYM`, `POS=NOUN\|Polarity=Neg`, `POS=AUX\|Polarity=Neg`, `POS=INTJ`, `POS=SCONJ\|Polarity=Neg` |
84
  | **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `compound`, `cop`, `csubj`, `dep`, `det`, `dislocated`, `fixed`, `mark`, `nmod`, `nsubj`, `nummod`, `obj`, `obl`, `punct` |
85
  | **`senter`** | `I`, `S` |
86
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `MOVEMENT`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PET_NAME`, `PHONE`, `PRODUCT`, `QUANTITY`, `TIME`, `TITLE_AFFIX`, `WORK_OF_ART` |
 
92
  | Type | Score |
93
  | --- | --- |
94
  | `TOKEN_ACC` | 99.69 |
95
+ | `TOKEN_P` | 97.65 |
96
+ | `TOKEN_R` | 97.90 |
97
+ | `TOKEN_F` | 97.77 |
98
+ | `POS_ACC` | 97.36 |
99
+ | `MORPH_ACC` | 0.40 |
100
+ | `MORPH_MICRO_P` | 34.01 |
101
+ | `MORPH_MICRO_R` | 98.04 |
102
+ | `MORPH_MICRO_F` | 50.51 |
103
+ | `SENTS_P` | 98.62 |
104
+ | `SENTS_R` | 98.82 |
105
+ | `SENTS_F` | 98.72 |
106
+ | `DEP_UAS` | 92.14 |
107
+ | `DEP_LAS` | 90.81 |
108
+ | `TAG_ACC` | 97.16 |
109
+ | `LEMMA_ACC` | 96.59 |
110
+ | `ENTS_P` | 74.02 |
111
+ | `ENTS_R` | 69.18 |
112
+ | `ENTS_F` | 71.52 |
accuracy.json CHANGED
@@ -1,222 +1,243 @@
1
  {
2
- "token_acc": 0.9968965945,
3
- "tag_acc": 0.9721899386,
4
- "pos_acc": 0.9639755682,
5
- "morph_acc": 0.0,
6
- "dep_uas": 0.9181002928,
7
- "dep_las": 0.8998080263,
8
- "ents_p": 0.760989011,
9
- "ents_r": 0.7075351213,
10
- "ents_f": 0.7332892124,
11
- "sents_p": 0.9860557769,
12
- "sents_r": 0.9880239521,
13
- "sents_f": 0.9870388833,
14
- "speed": 11674.7452722222,
15
  "morph_per_feat": {
16
  "Polarity": {
 
 
 
 
 
 
 
 
 
 
17
  "p": 0.0,
18
  "r": 0.0,
19
  "f": 0.0
20
  }
21
  },
 
 
 
 
 
22
  "dep_las_per_type": {
23
  "cc": {
24
- "p": 0.7872340426,
25
- "r": 0.8043478261,
26
- "f": 0.7956989247
27
- },
28
- "nummod": {
29
- "p": 0.9770114943,
30
- "r": 0.8762886598,
31
- "f": 0.9239130435
32
  },
33
  "compound": {
34
- "p": 0.9397972117,
35
- "r": 0.9205462446,
36
- "f": 0.9300721229
37
  },
38
  "obl": {
39
- "p": 0.7827102804,
40
- "r": 0.8052884615,
41
- "f": 0.7938388626
42
  },
43
  "case": {
44
- "p": 0.986533282,
45
- "r": 0.9789996182,
46
- "f": 0.9827520123
47
  },
48
  "dislocated": {
49
  "p": 0.5,
50
- "r": 0.2105263158,
51
- "f": 0.2962962963
52
- },
53
- "nmod": {
54
- "p": 0.8676092545,
55
- "r": 0.813253012,
56
- "f": 0.8395522388
57
  },
58
  "nsubj": {
59
- "p": 0.7950819672,
60
- "r": 0.8083333333,
61
- "f": 0.8016528926
 
 
 
 
 
62
  },
63
  "root": {
64
- "p": 0.9717171717,
65
- "r": 0.9600798403,
66
- "f": 0.9658634538
67
  },
68
  "aux": {
69
- "p": 0.9625090123,
70
- "r": 0.9673913043,
71
- "f": 0.9649439827
72
  },
73
  "advcl": {
74
- "p": 0.6802884615,
75
- "r": 0.6596736597,
76
- "f": 0.6698224852
77
  },
78
  "mark": {
79
- "p": 0.956,
80
- "r": 0.9409448819,
81
- "f": 0.9484126984
 
 
 
 
 
82
  },
83
  "acl": {
84
- "p": 0.7887931034,
85
- "r": 0.8061674009,
86
- "f": 0.7973856209
87
  },
88
  "obj": {
89
- "p": 0.950617284,
90
- "r": 0.9390243902,
91
- "f": 0.9447852761
92
  },
93
- "fixed": {
94
- "p": 0.9421052632,
95
- "r": 0.9835164835,
96
- "f": 0.9623655914
97
  },
98
  "advmod": {
99
- "p": 0.7045454545,
100
- "r": 0.4920634921,
101
- "f": 0.5794392523
102
  },
103
  "amod": {
104
- "p": 0.8888888889,
105
- "r": 0.6,
106
- "f": 0.7164179104
107
  },
108
  "cop": {
109
- "p": 0.9664804469,
110
- "r": 0.9505494505,
111
- "f": 0.9584487535
112
  },
113
  "ccomp": {
114
- "p": 0.9,
115
  "r": 0.8181818182,
116
- "f": 0.8571428571
 
 
 
 
 
117
  },
118
  "det": {
119
- "p": 0.9803921569,
120
- "r": 0.9803921569,
121
- "f": 0.9803921569
122
  },
123
  "dep": {
124
- "p": 0.0,
125
- "r": 0.0,
126
- "f": 0.0
127
- },
128
- "csubj": {
129
- "p": 0.8333333333,
130
- "r": 0.7692307692,
131
- "f": 0.8
132
  }
133
  },
 
 
 
 
 
134
  "ents_per_type": {
135
  "DATE": {
136
- "p": 0.9626168224,
137
- "r": 0.9537037037,
138
- "f": 0.9581395349
139
  },
140
  "ORG": {
141
- "p": 0.6637931034,
142
- "r": 0.5877862595,
143
- "f": 0.6234817814
144
  },
145
  "PERSON": {
146
- "p": 0.780141844,
147
- "r": 0.7913669065,
148
- "f": 0.7857142857
149
  },
150
  "GPE": {
151
- "p": 0.6956521739,
152
- "r": 0.6808510638,
153
- "f": 0.688172043
154
- },
155
- "EVENT": {
156
- "p": 0.6666666667,
157
- "r": 0.6153846154,
158
- "f": 0.64
159
  },
160
- "PRODUCT": {
161
- "p": 0.5666666667,
162
- "r": 0.4146341463,
163
- "f": 0.4788732394
164
  },
165
  "TIME": {
166
  "p": 0.6666666667,
167
  "r": 1.0,
168
  "f": 0.8
169
  },
170
- "QUANTITY": {
171
- "p": 0.8970588235,
172
- "r": 0.9242424242,
173
- "f": 0.9104477612
174
- },
175
  "NORP": {
176
- "p": 0.7037037037,
177
- "r": 0.59375,
178
- "f": 0.6440677966
179
- },
180
- "TITLE_AFFIX": {
181
- "p": 0.8571428571,
182
- "r": 0.6,
183
- "f": 0.7058823529
184
  },
185
  "ORDINAL": {
186
- "p": 0.65,
187
- "r": 0.6842105263,
188
- "f": 0.6666666667
 
 
 
 
 
189
  },
190
  "WORK_OF_ART": {
191
- "p": 0.6875,
192
- "r": 0.6470588235,
193
- "f": 0.6666666667
194
  },
195
- "FAC": {
196
- "p": 0.5769230769,
197
- "r": 0.4054054054,
198
- "f": 0.4761904762
199
  },
200
  "PERCENT": {
201
  "p": 1.0,
202
- "r": 0.4285714286,
203
- "f": 0.6
 
 
 
 
 
 
 
 
 
 
204
  },
205
  "LOC": {
206
- "p": 0.6,
207
- "r": 0.9,
208
- "f": 0.72
209
  },
210
  "MOVEMENT": {
211
- "p": 0.3333333333,
212
- "r": 0.2,
213
- "f": 0.25
214
- },
215
- "LAW": {
216
  "p": 0.0,
217
  "r": 0.0,
218
  "f": 0.0
219
  },
 
 
 
 
 
 
 
 
 
 
220
  "MONEY": {
221
  "p": 1.0,
222
  "r": 1.0,
@@ -226,11 +247,7 @@
226
  "p": 1.0,
227
  "r": 1.0,
228
  "f": 1.0
229
- },
230
- "CARDINAL": {
231
- "p": 0.0,
232
- "r": 0.0,
233
- "f": 0.0
234
  }
235
- }
 
236
  }
 
1
  {
2
+ "token_acc": 0.9968649485,
3
+ "token_p": 0.9764591282,
4
+ "token_r": 0.9790021974,
5
+ "token_f": 0.9777290092,
6
+ "pos_acc": 0.9736163946,
7
+ "morph_acc": 0.0040005162,
8
+ "morph_micro_p": 0.3401360544,
9
+ "morph_micro_r": 0.9803921569,
10
+ "morph_micro_f": 0.5050505051,
 
 
 
 
11
  "morph_per_feat": {
12
  "Polarity": {
13
+ "p": 1.0,
14
+ "r": 0.9803921569,
15
+ "f": 0.9900990099
16
+ },
17
+ "Inflection": {
18
+ "p": 0.0,
19
+ "r": 0.0,
20
+ "f": 0.0
21
+ },
22
+ "Reading": {
23
  "p": 0.0,
24
  "r": 0.0,
25
  "f": 0.0
26
  }
27
  },
28
+ "sents_p": 0.9862204724,
29
+ "sents_r": 0.9881656805,
30
+ "sents_f": 0.9871921182,
31
+ "dep_uas": 0.9214150689,
32
+ "dep_las": 0.9080930316,
33
  "dep_las_per_type": {
34
  "cc": {
35
+ "p": 0.75,
36
+ "r": 0.75,
37
+ "f": 0.75
 
 
 
 
 
38
  },
39
  "compound": {
40
+ "p": 0.9486581097,
41
+ "r": 0.916572717,
42
+ "f": 0.9323394495
43
  },
44
  "obl": {
45
+ "p": 0.8233082707,
46
+ "r": 0.8202247191,
47
+ "f": 0.8217636023
48
  },
49
  "case": {
50
+ "p": 0.9892679187,
51
+ "r": 0.9806231003,
52
+ "f": 0.9849265407
53
  },
54
  "dislocated": {
55
  "p": 0.5,
56
+ "r": 0.4615384615,
57
+ "f": 0.48
 
 
 
 
 
58
  },
59
  "nsubj": {
60
+ "p": 0.8281853282,
61
+ "r": 0.8234165067,
62
+ "f": 0.8257940327
63
+ },
64
+ "nmod": {
65
+ "p": 0.87875,
66
+ "r": 0.8222222222,
67
+ "f": 0.8495468278
68
  },
69
  "root": {
70
+ "p": 0.9560878244,
71
+ "r": 0.9447731755,
72
+ "f": 0.9503968254
73
  },
74
  "aux": {
75
+ "p": 0.9751381215,
76
+ "r": 0.9832869081,
77
+ "f": 0.9791955617
78
  },
79
  "advcl": {
80
+ "p": 0.6810933941,
81
+ "r": 0.6719101124,
82
+ "f": 0.6764705882
83
  },
84
  "mark": {
85
+ "p": 0.971659919,
86
+ "r": 0.96,
87
+ "f": 0.9657947686
88
+ },
89
+ "fixed": {
90
+ "p": 0.9571428571,
91
+ "r": 0.9745454545,
92
+ "f": 0.9657657658
93
  },
94
  "acl": {
95
+ "p": 0.8492239468,
96
+ "r": 0.8417582418,
97
+ "f": 0.8454746137
98
  },
99
  "obj": {
100
+ "p": 0.9662576687,
101
+ "r": 0.9516616314,
102
+ "f": 0.9589041096
103
  },
104
+ "nummod": {
105
+ "p": 0.9806451613,
106
+ "r": 0.899408284,
107
+ "f": 0.9382716049
108
  },
109
  "advmod": {
110
+ "p": 0.6691729323,
111
+ "r": 0.6357142857,
112
+ "f": 0.652014652
113
  },
114
  "amod": {
115
+ "p": 0.9310344828,
116
+ "r": 0.7297297297,
117
+ "f": 0.8181818182
118
  },
119
  "cop": {
120
+ "p": 0.9634146341,
121
+ "r": 0.9186046512,
122
+ "f": 0.9404761905
123
  },
124
  "ccomp": {
125
+ "p": 0.8571428571,
126
  "r": 0.8181818182,
127
+ "f": 0.8372093023
128
+ },
129
+ "csubj": {
130
+ "p": 0.4444444444,
131
+ "r": 0.6666666667,
132
+ "f": 0.5333333333
133
  },
134
  "det": {
135
+ "p": 0.9807692308,
136
+ "r": 0.9622641509,
137
+ "f": 0.9714285714
138
  },
139
  "dep": {
140
+ "p": 0.0769230769,
141
+ "r": 0.1428571429,
142
+ "f": 0.1
 
 
 
 
 
143
  }
144
  },
145
+ "tag_acc": 0.9715755942,
146
+ "lemma_acc": 0.9659109444,
147
+ "ents_p": 0.7402422611,
148
+ "ents_r": 0.6918238994,
149
+ "ents_f": 0.7152145644,
150
  "ents_per_type": {
151
  "DATE": {
152
+ "p": 0.9553571429,
153
+ "r": 0.9816513761,
154
+ "f": 0.9683257919
155
  },
156
  "ORG": {
157
+ "p": 0.5916666667,
158
+ "r": 0.5182481752,
159
+ "f": 0.5525291829
160
  },
161
  "PERSON": {
162
+ "p": 0.7816901408,
163
+ "r": 0.7985611511,
164
+ "f": 0.7900355872
165
  },
166
  "GPE": {
167
+ "p": 0.6774193548,
168
+ "r": 0.670212766,
169
+ "f": 0.6737967914
 
 
 
 
 
170
  },
171
+ "QUANTITY": {
172
+ "p": 0.8194444444,
173
+ "r": 0.8939393939,
174
+ "f": 0.8550724638
175
  },
176
  "TIME": {
177
  "p": 0.6666666667,
178
  "r": 1.0,
179
  "f": 0.8
180
  },
 
 
 
 
 
181
  "NORP": {
182
+ "p": 0.7407407407,
183
+ "r": 0.625,
184
+ "f": 0.6779661017
 
 
 
 
 
185
  },
186
  "ORDINAL": {
187
+ "p": 0.56,
188
+ "r": 0.6363636364,
189
+ "f": 0.5957446809
190
+ },
191
+ "TITLE_AFFIX": {
192
+ "p": 0.7916666667,
193
+ "r": 0.6333333333,
194
+ "f": 0.7037037037
195
  },
196
  "WORK_OF_ART": {
197
+ "p": 0.75,
198
+ "r": 0.7058823529,
199
+ "f": 0.7272727273
200
  },
201
+ "EVENT": {
202
+ "p": 0.8823529412,
203
+ "r": 0.5769230769,
204
+ "f": 0.6976744186
205
  },
206
  "PERCENT": {
207
  "p": 1.0,
208
+ "r": 0.2857142857,
209
+ "f": 0.4444444444
210
+ },
211
+ "CARDINAL": {
212
+ "p": 0.0,
213
+ "r": 0.0,
214
+ "f": 0.0
215
+ },
216
+ "FAC": {
217
+ "p": 0.5666666667,
218
+ "r": 0.4594594595,
219
+ "f": 0.5074626866
220
  },
221
  "LOC": {
222
+ "p": 0.5,
223
+ "r": 0.8,
224
+ "f": 0.6153846154
225
  },
226
  "MOVEMENT": {
 
 
 
 
 
227
  "p": 0.0,
228
  "r": 0.0,
229
  "f": 0.0
230
  },
231
+ "PRODUCT": {
232
+ "p": 0.5384615385,
233
+ "r": 0.3333333333,
234
+ "f": 0.4117647059
235
+ },
236
+ "LAW": {
237
+ "p": 1.0,
238
+ "r": 0.3333333333,
239
+ "f": 0.5
240
+ },
241
  "MONEY": {
242
  "p": 1.0,
243
  "r": 1.0,
 
247
  "p": 1.0,
248
  "r": 1.0,
249
  "f": 1.0
 
 
 
 
 
250
  }
251
+ },
252
+ "speed": 4912.7299798978
253
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
 
config.cfg CHANGED
@@ -1,10 +1,8 @@
1
  [paths]
2
- train = "corpus/ja-core-news/train.spacy"
3
- dev = "corpus/ja-core-news/dev.spacy"
4
- vectors = "corpus/ja_vectors"
5
- raw = null
6
  init_tok2vec = null
7
- vocab_data = null
8
 
9
  [system]
10
  gpu_allocator = null
@@ -12,7 +10,7 @@ seed = 0
12
 
13
  [nlp]
14
  lang = "ja"
15
- pipeline = ["tok2vec","parser","senter","attribute_ruler","ner"]
16
  disabled = ["senter"]
17
  before_creation = null
18
  after_creation = null
@@ -27,12 +25,29 @@ split_mode = null
27
 
28
  [components.attribute_ruler]
29
  factory = "attribute_ruler"
 
30
  validate = false
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  [components.ner]
33
  factory = "ner"
34
  incorrect_spans_key = null
35
  moves = null
 
36
  update_with_oracle_cut_size = 100
37
 
38
  [components.ner.model]
@@ -66,6 +81,7 @@ factory = "parser"
66
  learn_tokens = false
67
  min_action_freq = 30
68
  moves = null
 
69
  update_with_oracle_cut_size = 100
70
 
71
  [components.parser.model]
@@ -84,6 +100,8 @@ upstream = "tok2vec"
84
 
85
  [components.senter]
86
  factory = "senter"
 
 
87
 
88
  [components.senter.model]
89
  @architectures = "spacy.Tagger.v1"
@@ -130,17 +148,17 @@ maxout_pieces = 3
130
 
131
  [corpora.dev]
132
  @readers = "spacy.Corpus.v1"
133
- limit = 0
134
- max_length = 0
135
- path = ${paths:dev}
136
  gold_preproc = false
 
 
137
  augmenter = null
138
 
139
  [corpora.train]
140
  @readers = "spacy.Corpus.v1"
141
- path = ${paths:train}
142
- max_length = 5000
143
  gold_preproc = false
 
144
  limit = 0
145
  augmenter = null
146
 
@@ -173,9 +191,8 @@ compound = 1.001
173
  t = 0.0
174
 
175
  [training.logger]
176
- @loggers = "spacy.WandbLogger.v1"
177
- project_name = "spacy-v3.0.0a2"
178
- remove_config_values = []
179
 
180
  [training.optimizer]
181
  @optimizers = "Adam.v1"
@@ -189,21 +206,26 @@ eps = 0.00000001
189
  learn_rate = 0.001
190
 
191
  [training.score_weights]
 
 
 
192
  dep_uas = 0.0
193
- dep_las = 0.45
194
  dep_las_per_type = null
195
  sents_p = null
196
  sents_r = null
197
- sents_f = 0.06
198
- ents_f = 0.5
199
  ents_p = 0.0
200
  ents_r = 0.0
201
  ents_per_type = null
 
 
202
 
203
  [pretraining]
204
 
205
  [initialize]
206
- vocab_data = ${paths.vocab_data}
207
  vectors = ${paths.vectors}
208
  init_tok2vec = ${paths.init_tok2vec}
209
  before_init = null
@@ -211,6 +233,13 @@ after_init = null
211
 
212
  [initialize.components]
213
 
 
 
 
 
 
 
 
214
  [initialize.components.ner]
215
 
216
  [initialize.components.ner.labels]
 
1
  [paths]
2
+ train = null
3
+ dev = null
4
+ vectors = null
 
5
  init_tok2vec = null
 
6
 
7
  [system]
8
  gpu_allocator = null
 
10
 
11
  [nlp]
12
  lang = "ja"
13
+ pipeline = ["tok2vec","morphologizer","parser","senter","attribute_ruler","ner"]
14
  disabled = ["senter"]
15
  before_creation = null
16
  after_creation = null
 
25
 
26
  [components.attribute_ruler]
27
  factory = "attribute_ruler"
28
+ scorer = {"@scorers":"spacy.attribute_ruler_scorer.v1"}
29
  validate = false
30
 
31
+ [components.morphologizer]
32
+ factory = "morphologizer"
33
+ extend = true
34
+ overwrite = true
35
+ scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
36
+
37
+ [components.morphologizer.model]
38
+ @architectures = "spacy.Tagger.v1"
39
+ nO = null
40
+
41
+ [components.morphologizer.model.tok2vec]
42
+ @architectures = "spacy.Tok2VecListener.v1"
43
+ width = ${components.tok2vec.model.encode:width}
44
+ upstream = "tok2vec"
45
+
46
  [components.ner]
47
  factory = "ner"
48
  incorrect_spans_key = null
49
  moves = null
50
+ scorer = {"@scorers":"spacy.ner_scorer.v1"}
51
  update_with_oracle_cut_size = 100
52
 
53
  [components.ner.model]
 
81
  learn_tokens = false
82
  min_action_freq = 30
83
  moves = null
84
+ scorer = {"@scorers":"spacy.parser_scorer.v1"}
85
  update_with_oracle_cut_size = 100
86
 
87
  [components.parser.model]
 
100
 
101
  [components.senter]
102
  factory = "senter"
103
+ overwrite = false
104
+ scorer = {"@scorers":"spacy.senter_scorer.v1"}
105
 
106
  [components.senter.model]
107
  @architectures = "spacy.Tagger.v1"
 
148
 
149
  [corpora.dev]
150
  @readers = "spacy.Corpus.v1"
151
+ path = ${paths.dev}
 
 
152
  gold_preproc = false
153
+ max_length = 0
154
+ limit = 0
155
  augmenter = null
156
 
157
  [corpora.train]
158
  @readers = "spacy.Corpus.v1"
159
+ path = ${paths.train}
 
160
  gold_preproc = false
161
+ max_length = 0
162
  limit = 0
163
  augmenter = null
164
 
 
191
  t = 0.0
192
 
193
  [training.logger]
194
+ @loggers = "spacy.ConsoleLogger.v1"
195
+ progress_bar = false
 
196
 
197
  [training.optimizer]
198
  @optimizers = "Adam.v1"
 
206
  learn_rate = 0.001
207
 
208
  [training.score_weights]
209
+ pos_acc = 0.11
210
+ morph_micro_f = 0.33
211
+ morph_per_feat = null
212
  dep_uas = 0.0
213
+ dep_las = 0.21
214
  dep_las_per_type = null
215
  sents_p = null
216
  sents_r = null
217
+ sents_f = 0.03
218
+ ents_f = 0.21
219
  ents_p = 0.0
220
  ents_r = 0.0
221
  ents_per_type = null
222
+ morph_acc = 0.11
223
+ speed = 0.0
224
 
225
  [pretraining]
226
 
227
  [initialize]
228
+ vocab_data = null
229
  vectors = ${paths.vectors}
230
  init_tok2vec = ${paths.init_tok2vec}
231
  before_init = null
 
233
 
234
  [initialize.components]
235
 
236
+ [initialize.components.morphologizer]
237
+
238
+ [initialize.components.morphologizer.labels]
239
+ @readers = "spacy.read_labels.v1"
240
+ path = "corpus/labels/morphologizer.json"
241
+ require = false
242
+
243
  [initialize.components.ner]
244
 
245
  [initialize.components.ner.labels]
ja_core_news_lg-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:21c62d122014d7cb087f595285c74023bb80e7f27d9c8f558c89e5593bf1bfb5
3
- size 555963364
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aa64940154d4c04423c309463862d417ab7e232fbf55f790e25f89bb92912bf
3
+ size 556188875
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"ja",
3
  "name":"core_news_lg",
4
- "version":"3.1.0",
5
- "description":"Japanese pipeline optimized for CPU. Components: tok2vec, parser, senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.1.0,<3.2.0",
11
- "spacy_git_version":"caba63b74",
12
  "vectors":{
13
  "width":300,
14
  "vectors":480443,
@@ -18,6 +18,27 @@
18
  "labels":{
19
  "tok2vec":[
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ],
22
  "parser":[
23
  "ROOT",
@@ -78,12 +99,14 @@
78
  },
79
  "pipeline":[
80
  "tok2vec",
 
81
  "parser",
82
  "attribute_ruler",
83
  "ner"
84
  ],
85
  "components":[
86
  "tok2vec",
 
87
  "parser",
88
  "senter",
89
  "attribute_ruler",
@@ -93,224 +116,245 @@
93
  "senter"
94
  ],
95
  "performance":{
96
- "token_acc":0.9968965945,
97
- "tag_acc":0.9721899386,
98
- "pos_acc":0.9639755682,
99
- "morph_acc":0.0,
100
- "dep_uas":0.9181002928,
101
- "dep_las":0.8998080263,
102
- "ents_p":0.760989011,
103
- "ents_r":0.7075351213,
104
- "ents_f":0.7332892124,
105
- "sents_p":0.9860557769,
106
- "sents_r":0.9880239521,
107
- "sents_f":0.9870388833,
108
- "speed":11674.7452722222,
109
  "morph_per_feat":{
110
  "Polarity":{
 
 
 
 
 
 
 
 
 
 
111
  "p":0.0,
112
  "r":0.0,
113
  "f":0.0
114
  }
115
  },
 
 
 
 
 
116
  "dep_las_per_type":{
117
  "cc":{
118
- "p":0.7872340426,
119
- "r":0.8043478261,
120
- "f":0.7956989247
121
- },
122
- "nummod":{
123
- "p":0.9770114943,
124
- "r":0.8762886598,
125
- "f":0.9239130435
126
  },
127
  "compound":{
128
- "p":0.9397972117,
129
- "r":0.9205462446,
130
- "f":0.9300721229
131
  },
132
  "obl":{
133
- "p":0.7827102804,
134
- "r":0.8052884615,
135
- "f":0.7938388626
136
  },
137
  "case":{
138
- "p":0.986533282,
139
- "r":0.9789996182,
140
- "f":0.9827520123
141
  },
142
  "dislocated":{
143
  "p":0.5,
144
- "r":0.2105263158,
145
- "f":0.2962962963
146
- },
147
- "nmod":{
148
- "p":0.8676092545,
149
- "r":0.813253012,
150
- "f":0.8395522388
151
  },
152
  "nsubj":{
153
- "p":0.7950819672,
154
- "r":0.8083333333,
155
- "f":0.8016528926
 
 
 
 
 
156
  },
157
  "root":{
158
- "p":0.9717171717,
159
- "r":0.9600798403,
160
- "f":0.9658634538
161
  },
162
  "aux":{
163
- "p":0.9625090123,
164
- "r":0.9673913043,
165
- "f":0.9649439827
166
  },
167
  "advcl":{
168
- "p":0.6802884615,
169
- "r":0.6596736597,
170
- "f":0.6698224852
171
  },
172
  "mark":{
173
- "p":0.956,
174
- "r":0.9409448819,
175
- "f":0.9484126984
 
 
 
 
 
176
  },
177
  "acl":{
178
- "p":0.7887931034,
179
- "r":0.8061674009,
180
- "f":0.7973856209
181
  },
182
  "obj":{
183
- "p":0.950617284,
184
- "r":0.9390243902,
185
- "f":0.9447852761
186
  },
187
- "fixed":{
188
- "p":0.9421052632,
189
- "r":0.9835164835,
190
- "f":0.9623655914
191
  },
192
  "advmod":{
193
- "p":0.7045454545,
194
- "r":0.4920634921,
195
- "f":0.5794392523
196
  },
197
  "amod":{
198
- "p":0.8888888889,
199
- "r":0.6,
200
- "f":0.7164179104
201
  },
202
  "cop":{
203
- "p":0.9664804469,
204
- "r":0.9505494505,
205
- "f":0.9584487535
206
  },
207
  "ccomp":{
208
- "p":0.9,
209
  "r":0.8181818182,
210
- "f":0.8571428571
 
 
 
 
 
211
  },
212
  "det":{
213
- "p":0.9803921569,
214
- "r":0.9803921569,
215
- "f":0.9803921569
216
  },
217
  "dep":{
218
- "p":0.0,
219
- "r":0.0,
220
- "f":0.0
221
- },
222
- "csubj":{
223
- "p":0.8333333333,
224
- "r":0.7692307692,
225
- "f":0.8
226
  }
227
  },
 
 
 
 
 
228
  "ents_per_type":{
229
  "DATE":{
230
- "p":0.9626168224,
231
- "r":0.9537037037,
232
- "f":0.9581395349
233
  },
234
  "ORG":{
235
- "p":0.6637931034,
236
- "r":0.5877862595,
237
- "f":0.6234817814
238
  },
239
  "PERSON":{
240
- "p":0.780141844,
241
- "r":0.7913669065,
242
- "f":0.7857142857
243
  },
244
  "GPE":{
245
- "p":0.6956521739,
246
- "r":0.6808510638,
247
- "f":0.688172043
248
- },
249
- "EVENT":{
250
- "p":0.6666666667,
251
- "r":0.6153846154,
252
- "f":0.64
253
  },
254
- "PRODUCT":{
255
- "p":0.5666666667,
256
- "r":0.4146341463,
257
- "f":0.4788732394
258
  },
259
  "TIME":{
260
  "p":0.6666666667,
261
  "r":1.0,
262
  "f":0.8
263
  },
264
- "QUANTITY":{
265
- "p":0.8970588235,
266
- "r":0.9242424242,
267
- "f":0.9104477612
268
- },
269
  "NORP":{
270
- "p":0.7037037037,
271
- "r":0.59375,
272
- "f":0.6440677966
273
- },
274
- "TITLE_AFFIX":{
275
- "p":0.8571428571,
276
- "r":0.6,
277
- "f":0.7058823529
278
  },
279
  "ORDINAL":{
280
- "p":0.65,
281
- "r":0.6842105263,
282
- "f":0.6666666667
 
 
 
 
 
283
  },
284
  "WORK_OF_ART":{
285
- "p":0.6875,
286
- "r":0.6470588235,
287
- "f":0.6666666667
288
  },
289
- "FAC":{
290
- "p":0.5769230769,
291
- "r":0.4054054054,
292
- "f":0.4761904762
293
  },
294
  "PERCENT":{
295
  "p":1.0,
296
- "r":0.4285714286,
297
- "f":0.6
 
 
 
 
 
 
 
 
 
 
298
  },
299
  "LOC":{
300
- "p":0.6,
301
- "r":0.9,
302
- "f":0.72
303
  },
304
  "MOVEMENT":{
305
- "p":0.3333333333,
306
- "r":0.2,
307
- "f":0.25
308
- },
309
- "LAW":{
310
  "p":0.0,
311
  "r":0.0,
312
  "f":0.0
313
  },
 
 
 
 
 
 
 
 
 
 
314
  "MONEY":{
315
  "p":1.0,
316
  "r":1.0,
@@ -320,23 +364,19 @@
320
  "p":1.0,
321
  "r":1.0,
322
  "f":1.0
323
- },
324
- "CARDINAL":{
325
- "p":0.0,
326
- "r":0.0,
327
- "f":0.0
328
  }
329
- }
 
330
  },
331
  "sources":[
332
  {
333
- "name":"UD Japanese GSD v2.6",
334
  "url":"https://github.com/UniversalDependencies/UD_Japanese-GSD",
335
  "license":"CC BY-SA 4.0",
336
  "author":"Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel"
337
  },
338
  {
339
- "name":"UD Japanese GSD v2.6 NER",
340
  "url":"https://github.com/megagonlabs/UD_Japanese-GSD",
341
  "license":"CC BY-SA 4.0",
342
  "author":"Megagon Labs Tokyo"
 
1
  {
2
  "lang":"ja",
3
  "name":"core_news_lg",
4
+ "version":"3.2.0",
5
+ "description":"Japanese pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.2.0,<3.3.0",
11
+ "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":300,
14
  "vectors":480443,
 
18
  "labels":{
19
  "tok2vec":[
20
 
21
+ ],
22
+ "morphologizer":[
23
+ "POS=NOUN",
24
+ "POS=ADP",
25
+ "POS=VERB",
26
+ "POS=SCONJ",
27
+ "POS=AUX",
28
+ "POS=PUNCT",
29
+ "POS=PART",
30
+ "POS=DET",
31
+ "POS=NUM",
32
+ "POS=ADV",
33
+ "POS=PRON",
34
+ "POS=ADJ",
35
+ "POS=PROPN",
36
+ "POS=CCONJ",
37
+ "POS=SYM",
38
+ "POS=NOUN|Polarity=Neg",
39
+ "POS=AUX|Polarity=Neg",
40
+ "POS=INTJ",
41
+ "POS=SCONJ|Polarity=Neg"
42
  ],
43
  "parser":[
44
  "ROOT",
 
99
  },
100
  "pipeline":[
101
  "tok2vec",
102
+ "morphologizer",
103
  "parser",
104
  "attribute_ruler",
105
  "ner"
106
  ],
107
  "components":[
108
  "tok2vec",
109
+ "morphologizer",
110
  "parser",
111
  "senter",
112
  "attribute_ruler",
 
116
  "senter"
117
  ],
118
  "performance":{
119
+ "token_acc":0.9968649485,
120
+ "token_p":0.9764591282,
121
+ "token_r":0.9790021974,
122
+ "token_f":0.9777290092,
123
+ "pos_acc":0.9736163946,
124
+ "morph_acc":0.0040005162,
125
+ "morph_micro_p":0.3401360544,
126
+ "morph_micro_r":0.9803921569,
127
+ "morph_micro_f":0.5050505051,
 
 
 
 
128
  "morph_per_feat":{
129
  "Polarity":{
130
+ "p":1.0,
131
+ "r":0.9803921569,
132
+ "f":0.9900990099
133
+ },
134
+ "Inflection":{
135
+ "p":0.0,
136
+ "r":0.0,
137
+ "f":0.0
138
+ },
139
+ "Reading":{
140
  "p":0.0,
141
  "r":0.0,
142
  "f":0.0
143
  }
144
  },
145
+ "sents_p":0.9862204724,
146
+ "sents_r":0.9881656805,
147
+ "sents_f":0.9871921182,
148
+ "dep_uas":0.9214150689,
149
+ "dep_las":0.9080930316,
150
  "dep_las_per_type":{
151
  "cc":{
152
+ "p":0.75,
153
+ "r":0.75,
154
+ "f":0.75
 
 
 
 
 
155
  },
156
  "compound":{
157
+ "p":0.9486581097,
158
+ "r":0.916572717,
159
+ "f":0.9323394495
160
  },
161
  "obl":{
162
+ "p":0.8233082707,
163
+ "r":0.8202247191,
164
+ "f":0.8217636023
165
  },
166
  "case":{
167
+ "p":0.9892679187,
168
+ "r":0.9806231003,
169
+ "f":0.9849265407
170
  },
171
  "dislocated":{
172
  "p":0.5,
173
+ "r":0.4615384615,
174
+ "f":0.48
 
 
 
 
 
175
  },
176
  "nsubj":{
177
+ "p":0.8281853282,
178
+ "r":0.8234165067,
179
+ "f":0.8257940327
180
+ },
181
+ "nmod":{
182
+ "p":0.87875,
183
+ "r":0.8222222222,
184
+ "f":0.8495468278
185
  },
186
  "root":{
187
+ "p":0.9560878244,
188
+ "r":0.9447731755,
189
+ "f":0.9503968254
190
  },
191
  "aux":{
192
+ "p":0.9751381215,
193
+ "r":0.9832869081,
194
+ "f":0.9791955617
195
  },
196
  "advcl":{
197
+ "p":0.6810933941,
198
+ "r":0.6719101124,
199
+ "f":0.6764705882
200
  },
201
  "mark":{
202
+ "p":0.971659919,
203
+ "r":0.96,
204
+ "f":0.9657947686
205
+ },
206
+ "fixed":{
207
+ "p":0.9571428571,
208
+ "r":0.9745454545,
209
+ "f":0.9657657658
210
  },
211
  "acl":{
212
+ "p":0.8492239468,
213
+ "r":0.8417582418,
214
+ "f":0.8454746137
215
  },
216
  "obj":{
217
+ "p":0.9662576687,
218
+ "r":0.9516616314,
219
+ "f":0.9589041096
220
  },
221
+ "nummod":{
222
+ "p":0.9806451613,
223
+ "r":0.899408284,
224
+ "f":0.9382716049
225
  },
226
  "advmod":{
227
+ "p":0.6691729323,
228
+ "r":0.6357142857,
229
+ "f":0.652014652
230
  },
231
  "amod":{
232
+ "p":0.9310344828,
233
+ "r":0.7297297297,
234
+ "f":0.8181818182
235
  },
236
  "cop":{
237
+ "p":0.9634146341,
238
+ "r":0.9186046512,
239
+ "f":0.9404761905
240
  },
241
  "ccomp":{
242
+ "p":0.8571428571,
243
  "r":0.8181818182,
244
+ "f":0.8372093023
245
+ },
246
+ "csubj":{
247
+ "p":0.4444444444,
248
+ "r":0.6666666667,
249
+ "f":0.5333333333
250
  },
251
  "det":{
252
+ "p":0.9807692308,
253
+ "r":0.9622641509,
254
+ "f":0.9714285714
255
  },
256
  "dep":{
257
+ "p":0.0769230769,
258
+ "r":0.1428571429,
259
+ "f":0.1
 
 
 
 
 
260
  }
261
  },
262
+ "tag_acc":0.9715755942,
263
+ "lemma_acc":0.9659109444,
264
+ "ents_p":0.7402422611,
265
+ "ents_r":0.6918238994,
266
+ "ents_f":0.7152145644,
267
  "ents_per_type":{
268
  "DATE":{
269
+ "p":0.9553571429,
270
+ "r":0.9816513761,
271
+ "f":0.9683257919
272
  },
273
  "ORG":{
274
+ "p":0.5916666667,
275
+ "r":0.5182481752,
276
+ "f":0.5525291829
277
  },
278
  "PERSON":{
279
+ "p":0.7816901408,
280
+ "r":0.7985611511,
281
+ "f":0.7900355872
282
  },
283
  "GPE":{
284
+ "p":0.6774193548,
285
+ "r":0.670212766,
286
+ "f":0.6737967914
 
 
 
 
 
287
  },
288
+ "QUANTITY":{
289
+ "p":0.8194444444,
290
+ "r":0.8939393939,
291
+ "f":0.8550724638
292
  },
293
  "TIME":{
294
  "p":0.6666666667,
295
  "r":1.0,
296
  "f":0.8
297
  },
 
 
 
 
 
298
  "NORP":{
299
+ "p":0.7407407407,
300
+ "r":0.625,
301
+ "f":0.6779661017
 
 
 
 
 
302
  },
303
  "ORDINAL":{
304
+ "p":0.56,
305
+ "r":0.6363636364,
306
+ "f":0.5957446809
307
+ },
308
+ "TITLE_AFFIX":{
309
+ "p":0.7916666667,
310
+ "r":0.6333333333,
311
+ "f":0.7037037037
312
  },
313
  "WORK_OF_ART":{
314
+ "p":0.75,
315
+ "r":0.7058823529,
316
+ "f":0.7272727273
317
  },
318
+ "EVENT":{
319
+ "p":0.8823529412,
320
+ "r":0.5769230769,
321
+ "f":0.6976744186
322
  },
323
  "PERCENT":{
324
  "p":1.0,
325
+ "r":0.2857142857,
326
+ "f":0.4444444444
327
+ },
328
+ "CARDINAL":{
329
+ "p":0.0,
330
+ "r":0.0,
331
+ "f":0.0
332
+ },
333
+ "FAC":{
334
+ "p":0.5666666667,
335
+ "r":0.4594594595,
336
+ "f":0.5074626866
337
  },
338
  "LOC":{
339
+ "p":0.5,
340
+ "r":0.8,
341
+ "f":0.6153846154
342
  },
343
  "MOVEMENT":{
 
 
 
 
 
344
  "p":0.0,
345
  "r":0.0,
346
  "f":0.0
347
  },
348
+ "PRODUCT":{
349
+ "p":0.5384615385,
350
+ "r":0.3333333333,
351
+ "f":0.4117647059
352
+ },
353
+ "LAW":{
354
+ "p":1.0,
355
+ "r":0.3333333333,
356
+ "f":0.5
357
+ },
358
  "MONEY":{
359
  "p":1.0,
360
  "r":1.0,
 
364
  "p":1.0,
365
  "r":1.0,
366
  "f":1.0
 
 
 
 
 
367
  }
368
+ },
369
+ "speed":4912.7299798978
370
  },
371
  "sources":[
372
  {
373
+ "name":"UD Japanese GSD v2.8",
374
  "url":"https://github.com/UniversalDependencies/UD_Japanese-GSD",
375
  "license":"CC BY-SA 4.0",
376
  "author":"Omura, Mai; Miyao, Yusuke; Kanayama, Hiroshi; Matsuda, Hiroshi; Wakasa, Aya; Yamashita, Kayo; Asahara, Masayuki; Tanaka, Takaaki; Murawaki, Yugo; Matsumoto, Yuji; Mori, Shinsuke; Uematsu, Sumire; McDonald, Ryan; Nivre, Joakim; Zeman, Daniel"
377
  },
378
  {
379
+ "name":"UD Japanese GSD v2.8 NER",
380
  "url":"https://github.com/megagonlabs/UD_Japanese-GSD",
381
  "license":"CC BY-SA 4.0",
382
  "author":"Megagon Labs Tokyo"
morphologizer/cfg ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "extend":true,
3
+ "labels_morph":{
4
+ "POS=NOUN":"",
5
+ "POS=ADP":"",
6
+ "POS=VERB":"",
7
+ "POS=SCONJ":"",
8
+ "POS=AUX":"",
9
+ "POS=PUNCT":"",
10
+ "POS=PART":"",
11
+ "POS=DET":"",
12
+ "POS=NUM":"",
13
+ "POS=ADV":"",
14
+ "POS=PRON":"",
15
+ "POS=ADJ":"",
16
+ "POS=PROPN":"",
17
+ "POS=CCONJ":"",
18
+ "POS=SYM":"",
19
+ "POS=NOUN|Polarity=Neg":"Polarity=Neg",
20
+ "POS=AUX|Polarity=Neg":"Polarity=Neg",
21
+ "POS=INTJ":"",
22
+ "POS=SCONJ|Polarity=Neg":"Polarity=Neg"
23
+ },
24
+ "labels_pos":{
25
+ "POS=NOUN":92,
26
+ "POS=ADP":85,
27
+ "POS=VERB":100,
28
+ "POS=SCONJ":98,
29
+ "POS=AUX":87,
30
+ "POS=PUNCT":97,
31
+ "POS=PART":94,
32
+ "POS=DET":90,
33
+ "POS=NUM":93,
34
+ "POS=ADV":86,
35
+ "POS=PRON":95,
36
+ "POS=ADJ":84,
37
+ "POS=PROPN":96,
38
+ "POS=CCONJ":89,
39
+ "POS=SYM":99,
40
+ "POS=NOUN|Polarity=Neg":92,
41
+ "POS=AUX|Polarity=Neg":87,
42
+ "POS=INTJ":91,
43
+ "POS=SCONJ|Polarity=Neg":98
44
+ },
45
+ "overwrite":true
46
+ }
morphologizer/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa52799c89b4b2f3bfd27211e1bc9724f68b328e6e73efbbd5af21ee973b208d
3
+ size 7749
ner/model CHANGED
Binary files a/ner/model and b/ner/model differ
 
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves��{"0":{},"1":{"DATE":4112,"ORG":3465,"PERSON":2992,"QUANTITY":2502,"GPE":1927,"PRODUCT":1317,"FAC":1230,"ORDINAL":1095,"WORK_OF_ART":1022,"EVENT":865,"NORP":732,"LOC":557,"MONEY":400,"TITLE_AFFIX":343,"TIME":294,"PERCENT":272,"MOVEMENT":148,"LAW":94,"LANGUAGE":78,"CARDINAL":27,"PET_NAME":19,"PHONE":4},"2":{"DATE":4112,"ORG":3465,"PERSON":2992,"QUANTITY":2502,"GPE":1927,"PRODUCT":1317,"FAC":1230,"ORDINAL":1095,"WORK_OF_ART":1022,"EVENT":865,"NORP":732,"LOC":557,"MONEY":400,"TITLE_AFFIX":343,"TIME":294,"PERCENT":272,"MOVEMENT":148,"LAW":94,"LANGUAGE":78,"CARDINAL":27,"PET_NAME":19,"PHONE":4},"3":{"DATE":4112,"ORG":3465,"PERSON":2992,"QUANTITY":2502,"GPE":1927,"PRODUCT":1317,"FAC":1230,"ORDINAL":1095,"WORK_OF_ART":1022,"EVENT":865,"NORP":732,"LOC":557,"MONEY":400,"TITLE_AFFIX":343,"TIME":294,"PERCENT":272,"MOVEMENT":148,"LAW":94,"LANGUAGE":78,"CARDINAL":27,"PET_NAME":19,"PHONE":4},"4":{"DATE":4112,"ORG":3465,"PERSON":2992,"QUANTITY":2502,"GPE":1927,"PRODUCT":1317,"FAC":1230,"ORDINAL":1095,"WORK_OF_ART":1022,"EVENT":865,"NORP":732,"LOC":557,"MONEY":400,"TITLE_AFFIX":343,"TIME":294,"PERCENT":272,"MOVEMENT":148,"LAW":94,"LANGUAGE":78,"CARDINAL":27,"PET_NAME":19,"PHONE":4,"":1},"5":{"":1}}�cfg��neg_key�
 
1
+ ��moves��{"0":{},"1":{"DATE":4200,"ORG":3487,"PERSON":3042,"QUANTITY":2519,"GPE":1953,"PRODUCT":1328,"FAC":1243,"ORDINAL":1114,"WORK_OF_ART":1053,"EVENT":869,"NORP":734,"LOC":563,"MONEY":400,"TITLE_AFFIX":344,"TIME":300,"PERCENT":274,"MOVEMENT":148,"LAW":94,"LANGUAGE":82,"CARDINAL":27,"PET_NAME":20,"PHONE":4},"2":{"DATE":4200,"ORG":3487,"PERSON":3042,"QUANTITY":2519,"GPE":1953,"PRODUCT":1328,"FAC":1243,"ORDINAL":1114,"WORK_OF_ART":1053,"EVENT":869,"NORP":734,"LOC":563,"MONEY":400,"TITLE_AFFIX":344,"TIME":300,"PERCENT":274,"MOVEMENT":148,"LAW":94,"LANGUAGE":82,"CARDINAL":27,"PET_NAME":20,"PHONE":4},"3":{"DATE":4200,"ORG":3487,"PERSON":3042,"QUANTITY":2519,"GPE":1953,"PRODUCT":1328,"FAC":1243,"ORDINAL":1114,"WORK_OF_ART":1053,"EVENT":869,"NORP":734,"LOC":563,"MONEY":400,"TITLE_AFFIX":344,"TIME":300,"PERCENT":274,"MOVEMENT":148,"LAW":94,"LANGUAGE":82,"CARDINAL":27,"PET_NAME":20,"PHONE":4},"4":{"DATE":4200,"ORG":3487,"PERSON":3042,"QUANTITY":2519,"GPE":1953,"PRODUCT":1328,"FAC":1243,"ORDINAL":1114,"WORK_OF_ART":1053,"EVENT":869,"NORP":734,"LOC":563,"MONEY":400,"TITLE_AFFIX":344,"TIME":300,"PERCENT":274,"MOVEMENT":148,"LAW":94,"LANGUAGE":82,"CARDINAL":27,"PET_NAME":20,"PHONE":4,"":1},"5":{"":1}}�cfg��neg_key�
parser/model CHANGED
Binary files a/parser/model and b/parser/model differ
 
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�q{"0":{"":75008},"1":{"":80671},"2":{"compound":20642,"obl":11201,"nmod":11139,"nsubj":6348,"acl":6215,"advcl":6023,"obj":4334,"nummod":3800,"advmod":1393,"punct":1249,"det":813,"cc":695,"amod":366,"ccomp":327,"dislocated":266,"csubj":159,"dep":0},"3":{"case":35563,"aux":18454,"punct":14888,"mark":6577,"fixed":2698,"cop":2198,"compound":248,"dep":0},"4":{"ROOT":6787}}�cfg��neg_key�
 
1
+ ��moves�~{"0":{"":75051},"1":{"":81581},"2":{"compound":22178,"nmod":11296,"obl":10522,"nsubj":6649,"acl":6185,"advcl":5956,"obj":4364,"nummod":2247,"advmod":1841,"punct":1169,"det":822,"cc":699,"amod":357,"ccomp":335,"dislocated":233,"csubj":139,"dep":0},"3":{"case":35390,"punct":15051,"aux":14506,"fixed":7377,"mark":6390,"cop":2079,"compound":542,"advcl":148,"dep":56},"4":{"ROOT":6810}}�cfg��neg_key�
senter/cfg CHANGED
@@ -1,3 +1,3 @@
1
  {
2
-
3
  }
 
1
  {
2
+ "overwrite":false
3
  }
senter/model CHANGED
Binary files a/senter/model and b/senter/model differ
 
tok2vec/model CHANGED
Binary files a/tok2vec/model and b/tok2vec/model differ
 
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:11826e1ac851be0655fd038aba542199014396e08ad6c396ee6b0e70f9f1e0e8
3
- size 13755465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a3705f398a0ff28af5fa868cf82b05cb31087511af0019249e05f7ee9dab7db
3
+ size 15570179
vocab/vectors.cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "mode":"default"
3
+ }