Nitrino commited on
Commit
f46377d
1 Parent(s): de88508

Update spaCy pipeline

Browse files
LICENSES_SOURCES CHANGED
@@ -64,11 +64,11 @@ Princeton University and LICENSEE agrees to preserve same.```
64
 
65
 
66
 
67
- # Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl)
68
 
69
- * Author: Explosion
70
- * URL: https://github.com/explosion/spacy-vectors-builder
71
- * License: CC0
72
 
73
  ```
74
  The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
 
64
 
65
 
66
 
67
+ # GloVe Common Crawl
68
 
69
+ * Author: Jeffrey Pennington, Richard Socher, and Christopher D. Manning
70
+ * URL: https://nlp.stanford.edu/projects/glove/
71
+ * License: Public Domain Dedication and License v1.0
72
 
73
  ```
74
  The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
README.md CHANGED
@@ -14,41 +14,41 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.8494302632
18
  - name: NER Recall
19
  type: recall
20
- value: 0.8549178686
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.8521652315
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9732581964
31
  - task:
32
  name: UNLABELED_DEPENDENCIES
33
  type: token-classification
34
  metrics:
35
  - name: Unlabeled Attachment Score (UAS)
36
  type: f_score
37
- value: 0.9205112068
38
  - task:
39
  name: LABELED_DEPENDENCIES
40
  type: token-classification
41
  metrics:
42
  - name: Labeled Attachment Score (LAS)
43
  type: f_score
44
- value: 0.9022890411
45
  - task:
46
  name: SENTS
47
  type: token-classification
48
  metrics:
49
  - name: Sentences F-Score
50
  type: f_score
51
- value: 0.9076778775
52
  ---
53
  ---
54
  tags:
@@ -66,41 +66,47 @@ model-index:
66
  metrics:
67
  - name: NER Precision
68
  type: precision
69
- value: 0.8494302632
70
  - name: NER Recall
71
  type: recall
72
- value: 0.8549178686
73
  - name: NER F Score
74
  type: f_score
75
- value: 0.8521652315
76
  - task:
77
- name: TAG
78
  type: token-classification
79
  metrics:
80
- - name: TAG (XPOS) Accuracy
81
  type: accuracy
82
- value: 0.9732581964
83
  - task:
84
- name: UNLABELED_DEPENDENCIES
85
  type: token-classification
86
  metrics:
87
- - name: Unlabeled Attachment Score (UAS)
 
 
 
 
 
 
88
  type: f_score
89
- value: 0.9205112068
90
  - task:
91
- name: LABELED_DEPENDENCIES
92
  type: token-classification
93
  metrics:
94
- - name: Labeled Attachment Score (LAS)
95
- type: f_score
96
- value: 0.9022890411
97
  - task:
98
- name: SENTS
99
  type: token-classification
100
  metrics:
101
- - name: Sentences F-Score
102
- type: f_score
103
- value: 0.9076778775
104
  ---
105
  ### Details: https://spacy.io/models/en#en_core_web_md
106
 
@@ -109,12 +115,12 @@ English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter,
109
  | Feature | Description |
110
  | --- | --- |
111
  | **Name** | `en_core_web_md` |
112
- | **Version** | `3.5.0` |
113
- | **spaCy** | `>=3.5.0,<3.6.0` |
114
  | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
115
  | **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
116
- | **Vectors** | 514157 keys, 20000 unique vectors (300 dimensions) |
117
- | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br />[ClearNLP Constituent-to-Dependency Conversion](https://github.com/clir/clearnlp-guidelines/blob/master/md/components/dependency_conversion.md) (Emory University)<br />[WordNet 3.0](https://wordnet.princeton.edu/) (Princeton University)<br />[Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl)](https://github.com/explosion/spacy-vectors-builder) (Explosion) |
118
  | **License** | `MIT` |
119
  | **Author** | [Explosion](https://explosion.ai) |
120
 
@@ -122,12 +128,13 @@ English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter,
122
 
123
  <details>
124
 
125
- <summary>View label scheme (113 labels for 3 components)</summary>
126
 
127
  | Component | Labels |
128
  | --- | --- |
129
- | **`tagger`** | `$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, `_SP`, ```` |
130
  | **`parser`** | `ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp` |
 
131
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
132
 
133
  </details>
@@ -136,16 +143,16 @@ English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter,
136
 
137
  | Type | Score |
138
  | --- | --- |
139
- | `TOKEN_ACC` | 99.86 |
140
  | `TOKEN_P` | 99.57 |
141
  | `TOKEN_R` | 99.58 |
142
  | `TOKEN_F` | 99.57 |
143
- | `TAG_ACC` | 97.33 |
144
- | `SENTS_P` | 92.21 |
145
- | `SENTS_R` | 89.37 |
146
- | `SENTS_F` | 90.77 |
147
- | `DEP_UAS` | 92.05 |
148
- | `DEP_LAS` | 90.23 |
149
- | `ENTS_P` | 84.94 |
150
- | `ENTS_R` | 85.49 |
151
- | `ENTS_F` | 85.22 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.8531330602
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.8448016827
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.8489469314
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9736958159
31
  - task:
32
  name: UNLABELED_DEPENDENCIES
33
  type: token-classification
34
  metrics:
35
  - name: Unlabeled Attachment Score (UAS)
36
  type: f_score
37
+ value: 0.9186827918
38
  - task:
39
  name: LABELED_DEPENDENCIES
40
  type: token-classification
41
  metrics:
42
  - name: Labeled Attachment Score (LAS)
43
  type: f_score
44
+ value: 0.9006556195
45
  - task:
46
  name: SENTS
47
  type: token-classification
48
  metrics:
49
  - name: Sentences F-Score
50
  type: f_score
51
+ value: 0.9029823331
52
  ---
53
  ---
54
  tags:
 
66
  metrics:
67
  - name: NER Precision
68
  type: precision
69
+ value: 0.8531330602
70
  - name: NER Recall
71
  type: recall
72
+ value: 0.8448016827
73
  - name: NER F Score
74
  type: f_score
75
+ value: 0.8489469314
76
  - task:
77
+ name: POS
78
  type: token-classification
79
  metrics:
80
+ - name: POS Accuracy
81
  type: accuracy
82
+ value: 0.9736958159
83
  - task:
84
+ name: SENTER
85
  type: token-classification
86
  metrics:
87
+ - name: SENTER Precision
88
+ type: precision
89
+ value: 0.9144345238
90
+ - name: SENTER Recall
91
+ type: recall
92
+ value: 0.8918134442
93
+ - name: SENTER F Score
94
  type: f_score
95
+ value: 0.9029823331
96
  - task:
97
+ name: UNLABELED_DEPENDENCIES
98
  type: token-classification
99
  metrics:
100
+ - name: Unlabeled Dependencies Accuracy
101
+ type: accuracy
102
+ value: 0.9186827918
103
  - task:
104
+ name: LABELED_DEPENDENCIES
105
  type: token-classification
106
  metrics:
107
+ - name: Labeled Dependencies Accuracy
108
+ type: accuracy
109
+ value: 0.9186827918
110
  ---
111
  ### Details: https://spacy.io/models/en#en_core_web_md
112
 
 
115
  | Feature | Description |
116
  | --- | --- |
117
  | **Name** | `en_core_web_md` |
118
+ | **Version** | `3.2.0` |
119
+ | **spaCy** | `>=3.2.0,<3.3.0` |
120
  | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
121
  | **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
122
+ | **Vectors** | 684830 keys, 20000 unique vectors (300 dimensions) |
123
+ | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br />[ClearNLP Constituent-to-Dependency Conversion](https://github.com/clir/clearnlp-guidelines/blob/master/md/components/dependency_conversion.md) (Emory University)<br />[WordNet 3.0](https://wordnet.princeton.edu/) (Princeton University)<br />[GloVe Common Crawl](https://nlp.stanford.edu/projects/glove/) (Jeffrey Pennington, Richard Socher, and Christopher D. Manning) |
124
  | **License** | `MIT` |
125
  | **Author** | [Explosion](https://explosion.ai) |
126
 
 
128
 
129
  <details>
130
 
131
+ <summary>View label scheme (114 labels for 4 components)</summary>
132
 
133
  | Component | Labels |
134
  | --- | --- |
135
+ | **`tagger`** | `$`, `''`, `,`, `-LRB-`, `-RRB-`, `.`, `:`, `ADD`, `AFX`, `CC`, `CD`, `DT`, `EX`, `FW`, `HYPH`, `IN`, `JJ`, `JJR`, `JJS`, `LS`, `MD`, `NFP`, `NN`, `NNP`, `NNPS`, `NNS`, `PDT`, `POS`, `PRP`, `PRP$`, `RB`, `RBR`, `RBS`, `RP`, `SYM`, `TO`, `UH`, `VB`, `VBD`, `VBG`, `VBN`, `VBP`, `VBZ`, `WDT`, `WP`, `WP$`, `WRB`, `XX`, ```` |
136
  | **`parser`** | `ROOT`, `acl`, `acomp`, `advcl`, `advmod`, `agent`, `amod`, `appos`, `attr`, `aux`, `auxpass`, `case`, `cc`, `ccomp`, `compound`, `conj`, `csubj`, `csubjpass`, `dative`, `dep`, `det`, `dobj`, `expl`, `intj`, `mark`, `meta`, `neg`, `nmod`, `npadvmod`, `nsubj`, `nsubjpass`, `nummod`, `oprd`, `parataxis`, `pcomp`, `pobj`, `poss`, `preconj`, `predet`, `prep`, `prt`, `punct`, `quantmod`, `relcl`, `xcomp` |
137
+ | **`senter`** | `I`, `S` |
138
  | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
139
 
140
  </details>
 
143
 
144
  | Type | Score |
145
  | --- | --- |
146
+ | `TOKEN_ACC` | 99.93 |
147
  | `TOKEN_P` | 99.57 |
148
  | `TOKEN_R` | 99.58 |
149
  | `TOKEN_F` | 99.57 |
150
+ | `TAG_ACC` | 97.37 |
151
+ | `SENTS_P` | 91.44 |
152
+ | `SENTS_R` | 89.18 |
153
+ | `SENTS_F` | 90.30 |
154
+ | `DEP_UAS` | 91.87 |
155
+ | `DEP_LAS` | 90.07 |
156
+ | `ENTS_P` | 85.31 |
157
+ | `ENTS_R` | 84.48 |
158
+ | `ENTS_F` | 84.89 |
accuracy.json CHANGED
@@ -1,109 +1,109 @@
1
  {
2
- "token_acc": 0.9986194413,
3
- "token_p": 0.9956819193,
4
- "token_r": 0.9957659295,
5
- "token_f": 0.9957239226,
6
- "tag_acc": 0.9732581964,
7
- "sents_p": 0.9220717348,
8
- "sents_r": 0.8937264991,
9
- "sents_f": 0.9076778775,
10
- "dep_uas": 0.9205112068,
11
- "dep_las": 0.9022890411,
12
  "dep_las_per_type": {
13
  "prep": {
14
- "p": 0.8600946728,
15
- "r": 0.8686776703,
16
- "f": 0.8643648651
17
  },
18
  "det": {
19
- "p": 0.9784954995,
20
- "r": 0.9795726984,
21
- "f": 0.9790338026
22
  },
23
  "pobj": {
24
- "p": 0.9627663726,
25
- "r": 0.9687021402,
26
- "f": 0.9657251356
27
  },
28
  "nsubj": {
29
- "p": 0.9581267705,
30
- "r": 0.9483461117,
31
- "f": 0.9532113526
32
  },
33
  "aux": {
34
- "p": 0.9809024694,
35
- "r": 0.9830855515,
36
- "f": 0.9819927971
37
  },
38
  "advmod": {
39
- "p": 0.8600558423,
40
- "r": 0.8552078075,
41
- "f": 0.8576249736
42
  },
43
  "relcl": {
44
- "p": 0.7668209327,
45
- "r": 0.7815674891,
46
- "f": 0.7741239892
47
  },
48
  "root": {
49
- "p": 0.9203949608,
50
- "r": 0.8916155419,
51
- "f": 0.9057767055
52
  },
53
  "xcomp": {
54
- "p": 0.8884574656,
55
- "r": 0.9034458004,
56
- "f": 0.8958889482
57
  },
58
  "amod": {
59
- "p": 0.919493737,
60
- "r": 0.9131195335,
61
- "f": 0.9162955498
62
  },
63
  "compound": {
64
- "p": 0.9178322637,
65
- "r": 0.9318890622,
66
- "f": 0.9248072512
67
  },
68
  "poss": {
69
- "p": 0.9740755627,
70
- "r": 0.9756441224,
71
- "f": 0.9748592116
72
  },
73
  "ccomp": {
74
- "p": 0.7796324081,
75
- "r": 0.8466395112,
76
- "f": 0.8117555165
77
  },
78
  "attr": {
79
- "p": 0.9070904645,
80
- "r": 0.9360807401,
81
- "f": 0.9213576159
82
  },
83
  "case": {
84
- "p": 0.980188212,
85
- "r": 0.9904904905,
86
- "f": 0.9853124222
87
  },
88
  "mark": {
89
- "p": 0.9064065384,
90
- "r": 0.9109697933,
91
- "f": 0.9086824369
92
  },
93
  "intj": {
94
- "p": 0.6742364918,
95
- "r": 0.6307692308,
96
- "f": 0.6517789553
97
  },
98
  "advcl": {
99
- "p": 0.6793032787,
100
- "r": 0.6678418534,
101
- "f": 0.6735238095
102
  },
103
  "cc": {
104
- "p": 0.8407122233,
105
- "r": 0.8357851932,
106
- "f": 0.8382414682
107
  },
108
  "neg": {
109
  "p": 0.9431988042,
@@ -111,220 +111,220 @@
111
  "f": 0.9465
112
  },
113
  "conj": {
114
- "p": 0.7720826076,
115
- "r": 0.7812185297,
116
- "f": 0.7766237017
117
  },
118
  "nsubjpass": {
119
- "p": 0.9211997966,
120
- "r": 0.9292307692,
121
- "f": 0.9251978555
122
  },
123
  "auxpass": {
124
- "p": 0.9465311533,
125
- "r": 0.9758542141,
126
- "f": 0.9609690444
127
  },
128
  "dobj": {
129
- "p": 0.9266134085,
130
- "r": 0.9427842856,
131
- "f": 0.9346289055
132
  },
133
  "nummod": {
134
- "p": 0.9384693618,
135
- "r": 0.9320707071,
136
- "f": 0.9352590903
137
  },
138
  "npadvmod": {
139
- "p": 0.7770219199,
140
- "r": 0.7303730018,
141
- "f": 0.7529756455
142
  },
143
  "prt": {
144
- "p": 0.8134206219,
145
- "r": 0.8906810036,
146
- "f": 0.8502994012
147
  },
148
  "pcomp": {
149
- "p": 0.8900785153,
150
- "r": 0.8732492997,
151
- "f": 0.8815835984
152
  },
153
  "expl": {
154
- "p": 0.9809725159,
155
- "r": 0.9935760171,
156
- "f": 0.9872340426
157
  },
158
  "acl": {
159
- "p": 0.7492762015,
160
- "r": 0.7059465357,
161
- "f": 0.7269662921
162
  },
163
  "agent": {
164
- "p": 0.8900169205,
165
- "r": 0.9426523297,
166
- "f": 0.9155787641
167
  },
168
  "dative": {
169
- "p": 0.8016085791,
170
- "r": 0.6857798165,
171
- "f": 0.739184178
172
  },
173
  "acomp": {
174
- "p": 0.9135460009,
175
- "r": 0.8961451247,
176
- "f": 0.9047619048
177
  },
178
  "dep": {
179
- "p": 0.3758389262,
180
- "r": 0.1818181818,
181
- "f": 0.2450765864
182
  },
183
  "csubj": {
184
- "p": 0.7878787879,
185
- "r": 0.7692307692,
186
- "f": 0.7784431138
187
  },
188
  "quantmod": {
189
- "p": 0.8629893238,
190
- "r": 0.7879772543,
191
- "f": 0.8237791932
192
  },
193
  "nmod": {
194
- "p": 0.7400150716,
195
- "r": 0.5984156002,
196
- "f": 0.6617250674
197
  },
198
  "appos": {
199
- "p": 0.702283105,
200
- "r": 0.6672451193,
201
- "f": 0.6843159066
202
  },
203
  "predet": {
204
- "p": 0.84,
205
  "r": 0.9012875536,
206
- "f": 0.8695652174
207
  },
208
  "preconj": {
209
- "p": 0.3617021277,
210
- "r": 0.5930232558,
211
- "f": 0.449339207
212
  },
213
  "oprd": {
214
- "p": 0.8333333333,
215
- "r": 0.7462686567,
216
- "f": 0.7874015748
217
  },
218
  "parataxis": {
219
- "p": 0.6051948052,
220
- "r": 0.5054229935,
221
- "f": 0.5508274232
222
  },
223
  "meta": {
224
- "p": 0.78125,
225
- "r": 0.4807692308,
226
- "f": 0.5952380952
227
  },
228
  "csubjpass": {
229
- "p": 0.5555555556,
230
- "r": 0.8333333333,
231
- "f": 0.6666666667
232
  }
233
  },
234
- "ents_p": 0.8494302632,
235
- "ents_r": 0.8549178686,
236
- "ents_f": 0.8521652315,
237
  "ents_per_type": {
238
  "DATE": {
239
- "p": 0.8584701146,
240
- "r": 0.88,
241
- "f": 0.8691017401
242
  },
243
  "GPE": {
244
- "p": 0.9209341587,
245
- "r": 0.9129707113,
246
- "f": 0.916935145
247
  },
248
  "ORDINAL": {
249
- "p": 0.7768595041,
250
- "r": 0.8757763975,
251
- "f": 0.8233576642
 
 
 
 
 
252
  },
253
  "ORG": {
254
- "p": 0.8124188101,
255
- "r": 0.8290031813,
256
- "f": 0.8206272143
257
  },
258
  "QUANTITY": {
259
- "p": 0.8053691275,
260
- "r": 0.6593406593,
261
- "f": 0.7250755287
262
  },
263
  "CARDINAL": {
264
- "p": 0.8215281651,
265
- "r": 0.8757431629,
266
- "f": 0.8477697842
267
- },
268
- "FAC": {
269
- "p": 0.425,
270
- "r": 0.3923076923,
271
- "f": 0.408
272
- },
273
- "PERSON": {
274
- "p": 0.8683001531,
275
- "r": 0.9252610966,
276
- "f": 0.8958761258
277
  },
278
  "NORP": {
279
- "p": 0.8922716628,
280
- "r": 0.9144,
281
- "f": 0.9032003161
282
  },
283
  "LOC": {
284
- "p": 0.7168458781,
285
- "r": 0.6369426752,
286
- "f": 0.6745362563
 
 
 
 
 
287
  },
288
  "TIME": {
289
- "p": 0.7065527066,
290
- "r": 0.7251461988,
291
- "f": 0.7157287157
292
  },
293
- "MONEY": {
294
- "p": 0.9112709832,
295
- "r": 0.8972845336,
296
- "f": 0.9042236764
297
  },
298
- "WORK_OF_ART": {
299
- "p": 0.4113475177,
300
- "r": 0.2989690722,
301
- "f": 0.3462686567
302
  },
303
  "EVENT": {
304
- "p": 0.6024096386,
305
- "r": 0.2873563218,
306
- "f": 0.3891050584
 
 
 
 
 
307
  },
308
  "LAW": {
309
- "p": 0.5737704918,
310
- "r": 0.546875,
311
- "f": 0.56
312
  },
313
  "PERCENT": {
314
- "p": 0.9020537125,
315
- "r": 0.8744257274,
316
- "f": 0.8880248834
317
  },
318
  "LANGUAGE": {
319
- "p": 0.7083333333,
320
- "r": 0.53125,
321
- "f": 0.6071428571
322
- },
323
- "PRODUCT": {
324
- "p": 0.6363636364,
325
- "r": 0.2654028436,
326
- "f": 0.3745819398
327
  }
328
  },
329
- "speed": 9607.0019342563
330
  }
 
1
  {
2
+ "token_acc": 0.9993053983,
3
+ "token_p": 0.9956742163,
4
+ "token_r": 0.9957505887,
5
+ "token_f": 0.9957124011,
6
+ "tag_acc": 0.9736958159,
7
+ "sents_p": 0.9144345238,
8
+ "sents_r": 0.8918134442,
9
+ "sents_f": 0.9029823331,
10
+ "dep_uas": 0.9186827918,
11
+ "dep_las": 0.9006556195,
12
  "dep_las_per_type": {
13
  "prep": {
14
+ "p": 0.8569122175,
15
+ "r": 0.8659836843,
16
+ "f": 0.8614240691
17
  },
18
  "det": {
19
+ "p": 0.9770765472,
20
+ "r": 0.9784310528,
21
+ "f": 0.9777533309
22
  },
23
  "pobj": {
24
+ "p": 0.9611128429,
25
+ "r": 0.968623601,
26
+ "f": 0.9648536056
27
  },
28
  "nsubj": {
29
+ "p": 0.9594312375,
30
+ "r": 0.9459802848,
31
+ "f": 0.9526582837
32
  },
33
  "aux": {
34
+ "p": 0.9797621161,
35
+ "r": 0.9826404344,
36
+ "f": 0.9811991644
37
  },
38
  "advmod": {
39
+ "p": 0.8561672709,
40
+ "r": 0.8543664816,
41
+ "f": 0.8552659283
42
  },
43
  "relcl": {
44
+ "p": 0.765480427,
45
+ "r": 0.780478955,
46
+ "f": 0.772906935
47
  },
48
  "root": {
49
+ "p": 0.9166215118,
50
+ "r": 0.8927369879,
51
+ "f": 0.9045216055
52
  },
53
  "xcomp": {
54
+ "p": 0.8828097423,
55
+ "r": 0.8977027997,
56
+ "f": 0.8901939847
57
  },
58
  "amod": {
59
+ "p": 0.92090506,
60
+ "r": 0.9149983803,
61
+ "f": 0.9179422183
62
  },
63
  "compound": {
64
+ "p": 0.917950968,
65
+ "r": 0.9321118289,
66
+ "f": 0.924977203
67
  },
68
  "poss": {
69
+ "p": 0.9744877461,
70
+ "r": 0.9764492754,
71
+ "f": 0.9754675246
72
  },
73
  "ccomp": {
74
+ "p": 0.7754030746,
75
+ "r": 0.8423625255,
76
+ "f": 0.8074970715
77
  },
78
  "attr": {
79
+ "p": 0.8974979822,
80
+ "r": 0.9352396972,
81
+ "f": 0.9159802306
82
  },
83
  "case": {
84
+ "p": 0.9811881188,
85
+ "r": 0.991991992,
86
+ "f": 0.9865604778
87
  },
88
  "mark": {
89
+ "p": 0.9043686734,
90
+ "r": 0.8995760466,
91
+ "f": 0.9019659936
92
  },
93
  "intj": {
94
+ "p": 0.6650717703,
95
+ "r": 0.610989011,
96
+ "f": 0.636884307
97
  },
98
  "advcl": {
99
+ "p": 0.6723033564,
100
+ "r": 0.6607907328,
101
+ "f": 0.666497333
102
  },
103
  "cc": {
104
+ "p": 0.835978836,
105
+ "r": 0.8314794881,
106
+ "f": 0.8337230917
107
  },
108
  "neg": {
109
  "p": 0.9431988042,
 
111
  "f": 0.9465
112
  },
113
  "conj": {
114
+ "p": 0.7615497433,
115
+ "r": 0.7843655589,
116
+ "f": 0.7727892844
117
  },
118
  "nsubjpass": {
119
+ "p": 0.9269311065,
120
+ "r": 0.9107692308,
121
+ "f": 0.9187790998
122
  },
123
  "auxpass": {
124
+ "p": 0.9508050089,
125
+ "r": 0.9685649203,
126
+ "f": 0.9596027985
127
  },
128
  "dobj": {
129
+ "p": 0.9220839813,
130
+ "r": 0.9449358515,
131
+ "f": 0.9333700657
132
  },
133
  "nummod": {
134
+ "p": 0.9399338254,
135
+ "r": 0.9325757576,
136
+ "f": 0.9362403346
137
  },
138
  "npadvmod": {
139
+ "p": 0.7793445122,
140
+ "r": 0.7264653641,
141
+ "f": 0.7519764663
142
  },
143
  "prt": {
144
+ "p": 0.8145094806,
145
+ "r": 0.8853046595,
146
+ "f": 0.8484328038
147
  },
148
  "pcomp": {
149
+ "p": 0.8889679715,
150
+ "r": 0.8746498599,
151
+ "f": 0.8817507942
152
  },
153
  "expl": {
154
+ "p": 0.983014862,
155
+ "r": 0.9914346895,
156
+ "f": 0.987206823
157
  },
158
  "acl": {
159
+ "p": 0.7449741528,
160
+ "r": 0.7075831969,
161
+ "f": 0.7257974259
162
  },
163
  "agent": {
164
+ "p": 0.8957264957,
165
+ "r": 0.9390681004,
166
+ "f": 0.9168853893
167
  },
168
  "dative": {
169
+ "p": 0.7732997481,
170
+ "r": 0.7041284404,
171
+ "f": 0.7370948379
172
  },
173
  "acomp": {
174
+ "p": 0.9094236048,
175
+ "r": 0.9015873016,
176
+ "f": 0.9054884992
177
  },
178
  "dep": {
179
+ "p": 0.3909465021,
180
+ "r": 0.1542207792,
181
+ "f": 0.2211874272
182
  },
183
  "csubj": {
184
+ "p": 0.8098591549,
185
+ "r": 0.6804733728,
186
+ "f": 0.7395498392
187
  },
188
  "quantmod": {
189
+ "p": 0.8739800544,
190
+ "r": 0.7831031682,
191
+ "f": 0.8260497001
192
  },
193
  "nmod": {
194
+ "p": 0.7614457831,
195
+ "r": 0.5776965265,
196
+ "f": 0.656964657
197
  },
198
  "appos": {
199
+ "p": 0.6850678733,
200
+ "r": 0.6568329718,
201
+ "f": 0.6706533776
202
  },
203
  "predet": {
204
+ "p": 0.8467741935,
205
  "r": 0.9012875536,
206
+ "f": 0.8731808732
207
  },
208
  "preconj": {
209
+ "p": 0.5454545455,
210
+ "r": 0.6279069767,
211
+ "f": 0.5837837838
212
  },
213
  "oprd": {
214
+ "p": 0.8413793103,
215
+ "r": 0.728358209,
216
+ "f": 0.7808
217
  },
218
  "parataxis": {
219
+ "p": 0.6129943503,
220
+ "r": 0.4707158351,
221
+ "f": 0.5325153374
222
  },
223
  "meta": {
224
+ "p": 0.8,
225
+ "r": 0.3076923077,
226
+ "f": 0.4444444444
227
  },
228
  "csubjpass": {
229
+ "p": 0.5714285714,
230
+ "r": 0.6666666667,
231
+ "f": 0.6153846154
232
  }
233
  },
234
+ "ents_p": 0.8531330602,
235
+ "ents_r": 0.8448016827,
236
+ "ents_f": 0.8489469314,
237
  "ents_per_type": {
238
  "DATE": {
239
+ "p": 0.8645998102,
240
+ "r": 0.8676190476,
241
+ "f": 0.8661067977
242
  },
243
  "GPE": {
244
+ "p": 0.9183846371,
245
+ "r": 0.9071129707,
246
+ "f": 0.9127140051
247
  },
248
  "ORDINAL": {
249
+ "p": 0.7765363128,
250
+ "r": 0.8633540373,
251
+ "f": 0.8176470588
252
+ },
253
+ "PERSON": {
254
+ "p": 0.8805737449,
255
+ "r": 0.9216710183,
256
+ "f": 0.9006538032
257
  },
258
  "ORG": {
259
+ "p": 0.8025329543,
260
+ "r": 0.8231707317,
261
+ "f": 0.8127208481
262
  },
263
  "QUANTITY": {
264
+ "p": 0.7697841727,
265
+ "r": 0.5879120879,
266
+ "f": 0.6666666667
267
  },
268
  "CARDINAL": {
269
+ "p": 0.8279202279,
270
+ "r": 0.8638525565,
271
+ "f": 0.8455048007
 
 
 
 
 
 
 
 
 
 
272
  },
273
  "NORP": {
274
+ "p": 0.9102667745,
275
+ "r": 0.9008,
276
+ "f": 0.905508645
277
  },
278
  "LOC": {
279
+ "p": 0.7022058824,
280
+ "r": 0.6082802548,
281
+ "f": 0.6518771331
282
+ },
283
+ "FAC": {
284
+ "p": 0.4122807018,
285
+ "r": 0.3615384615,
286
+ "f": 0.3852459016
287
  },
288
  "TIME": {
289
+ "p": 0.7450980392,
290
+ "r": 0.6666666667,
291
+ "f": 0.7037037037
292
  },
293
+ "PRODUCT": {
294
+ "p": 0.6376811594,
295
+ "r": 0.2085308057,
296
+ "f": 0.3142857143
297
  },
298
+ "MONEY": {
299
+ "p": 0.9027611044,
300
+ "r": 0.8878394333,
301
+ "f": 0.8952380952
302
  },
303
  "EVENT": {
304
+ "p": 0.6043956044,
305
+ "r": 0.316091954,
306
+ "f": 0.4150943396
307
+ },
308
+ "WORK_OF_ART": {
309
+ "p": 0.5317460317,
310
+ "r": 0.3453608247,
311
+ "f": 0.41875
312
  },
313
  "LAW": {
314
+ "p": 0.4666666667,
315
+ "r": 0.328125,
316
+ "f": 0.3853211009
317
  },
318
  "PERCENT": {
319
+ "p": 0.9090909091,
320
+ "r": 0.8728943338,
321
+ "f": 0.890625
322
  },
323
  "LANGUAGE": {
324
+ "p": 0.6956521739,
325
+ "r": 0.5,
326
+ "f": 0.5818181818
 
 
 
 
 
327
  }
328
  },
329
+ "speed": 7620.1455610511
330
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
 
config.cfg CHANGED
@@ -54,8 +54,8 @@ nO = null
54
  [components.ner.model.tok2vec.embed]
55
  @architectures = "spacy.MultiHashEmbed.v2"
56
  width = 96
57
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
58
- rows = [5000,1000,2500,2500]
59
  include_static_vectors = true
60
 
61
  [components.ner.model.tok2vec.encode]
@@ -93,9 +93,8 @@ overwrite = false
93
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
94
 
95
  [components.senter.model]
96
- @architectures = "spacy.Tagger.v2"
97
  nO = null
98
- normalize = false
99
 
100
  [components.senter.model.tok2vec]
101
  @architectures = "spacy.Tok2Vec.v2"
@@ -116,14 +115,12 @@ maxout_pieces = 2
116
 
117
  [components.tagger]
118
  factory = "tagger"
119
- neg_prefix = "!"
120
  overwrite = false
121
  scorer = {"@scorers":"spacy.tagger_scorer.v1"}
122
 
123
  [components.tagger.model]
124
- @architectures = "spacy.Tagger.v2"
125
  nO = null
126
- normalize = false
127
 
128
  [components.tagger.model.tok2vec]
129
  @architectures = "spacy.Tok2VecListener.v1"
@@ -139,8 +136,8 @@ factory = "tok2vec"
139
  [components.tok2vec.model.embed]
140
  @architectures = "spacy.MultiHashEmbed.v2"
141
  width = ${components.tok2vec.model.encode:width}
142
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY","IS_SPACE"]
143
- rows = [5000,1000,2500,2500,50,50]
144
  include_static_vectors = true
145
 
146
  [components.tok2vec.model.encode]
@@ -177,12 +174,11 @@ dropout = 0.1
177
  accumulate_gradient = 1
178
  patience = 5000
179
  max_epochs = 0
180
- max_steps = 100000
181
  eval_frequency = 1000
182
  frozen_components = []
183
  before_to_disk = null
184
  annotating_components = []
185
- before_update = null
186
 
187
  [training.batcher]
188
  @batchers = "spacy.batch_by_words.v1"
 
54
  [components.ner.model.tok2vec.embed]
55
  @architectures = "spacy.MultiHashEmbed.v2"
56
  width = 96
57
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
58
+ rows = [5000,2500,2500,2500,100]
59
  include_static_vectors = true
60
 
61
  [components.ner.model.tok2vec.encode]
 
93
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
94
 
95
  [components.senter.model]
96
+ @architectures = "spacy.Tagger.v1"
97
  nO = null
 
98
 
99
  [components.senter.model.tok2vec]
100
  @architectures = "spacy.Tok2Vec.v2"
 
115
 
116
  [components.tagger]
117
  factory = "tagger"
 
118
  overwrite = false
119
  scorer = {"@scorers":"spacy.tagger_scorer.v1"}
120
 
121
  [components.tagger.model]
122
+ @architectures = "spacy.Tagger.v1"
123
  nO = null
 
124
 
125
  [components.tagger.model.tok2vec]
126
  @architectures = "spacy.Tok2VecListener.v1"
 
136
  [components.tok2vec.model.embed]
137
  @architectures = "spacy.MultiHashEmbed.v2"
138
  width = ${components.tok2vec.model.encode:width}
139
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
140
+ rows = [5000,2500,2500,2500,100]
141
  include_static_vectors = true
142
 
143
  [components.tok2vec.model.encode]
 
174
  accumulate_gradient = 1
175
  patience = 5000
176
  max_epochs = 0
177
+ max_steps = 0
178
  eval_frequency = 1000
179
  frozen_components = []
180
  before_to_disk = null
181
  annotating_components = []
 
182
 
183
  [training.batcher]
184
  @batchers = "spacy.batch_by_words.v1"
en_core_web_md-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a7006734d285e428cf5ac4f86edc1722e07306d421431b176acaead4b31f2ca7
3
- size 191921538
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:888f818625ab02fcfb998ffd7b5c7379d1408a4f160d74669dbc473a36a0091d
3
+ size 281122205
meta.json CHANGED
@@ -1,18 +1,18 @@
1
  {
2
  "lang":"en",
3
  "name":"core_web_md",
4
- "version":"3.5.0",
5
  "description":"English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"MIT",
10
- "spacy_version":">=3.5.0,<3.6.0",
11
- "spacy_git_version":"9e0322de1",
12
  "vectors":{
13
  "width":300,
14
  "vectors":20000,
15
- "keys":514157,
16
  "name":"en_vectors"
17
  },
18
  "labels":{
@@ -68,7 +68,6 @@
68
  "WP$",
69
  "WRB",
70
  "XX",
71
- "_SP",
72
  "``"
73
  ],
74
  "parser":[
@@ -118,6 +117,10 @@
118
  "relcl",
119
  "xcomp"
120
  ],
 
 
 
 
121
  "attribute_ruler":[
122
 
123
  ],
@@ -166,111 +169,111 @@
166
  "senter"
167
  ],
168
  "performance":{
169
- "token_acc":0.9986194413,
170
- "token_p":0.9956819193,
171
- "token_r":0.9957659295,
172
- "token_f":0.9957239226,
173
- "tag_acc":0.9732581964,
174
- "sents_p":0.9220717348,
175
- "sents_r":0.8937264991,
176
- "sents_f":0.9076778775,
177
- "dep_uas":0.9205112068,
178
- "dep_las":0.9022890411,
179
  "dep_las_per_type":{
180
  "prep":{
181
- "p":0.8600946728,
182
- "r":0.8686776703,
183
- "f":0.8643648651
184
  },
185
  "det":{
186
- "p":0.9784954995,
187
- "r":0.9795726984,
188
- "f":0.9790338026
189
  },
190
  "pobj":{
191
- "p":0.9627663726,
192
- "r":0.9687021402,
193
- "f":0.9657251356
194
  },
195
  "nsubj":{
196
- "p":0.9581267705,
197
- "r":0.9483461117,
198
- "f":0.9532113526
199
  },
200
  "aux":{
201
- "p":0.9809024694,
202
- "r":0.9830855515,
203
- "f":0.9819927971
204
  },
205
  "advmod":{
206
- "p":0.8600558423,
207
- "r":0.8552078075,
208
- "f":0.8576249736
209
  },
210
  "relcl":{
211
- "p":0.7668209327,
212
- "r":0.7815674891,
213
- "f":0.7741239892
214
  },
215
  "root":{
216
- "p":0.9203949608,
217
- "r":0.8916155419,
218
- "f":0.9057767055
219
  },
220
  "xcomp":{
221
- "p":0.8884574656,
222
- "r":0.9034458004,
223
- "f":0.8958889482
224
  },
225
  "amod":{
226
- "p":0.919493737,
227
- "r":0.9131195335,
228
- "f":0.9162955498
229
  },
230
  "compound":{
231
- "p":0.9178322637,
232
- "r":0.9318890622,
233
- "f":0.9248072512
234
  },
235
  "poss":{
236
- "p":0.9740755627,
237
- "r":0.9756441224,
238
- "f":0.9748592116
239
  },
240
  "ccomp":{
241
- "p":0.7796324081,
242
- "r":0.8466395112,
243
- "f":0.8117555165
244
  },
245
  "attr":{
246
- "p":0.9070904645,
247
- "r":0.9360807401,
248
- "f":0.9213576159
249
  },
250
  "case":{
251
- "p":0.980188212,
252
- "r":0.9904904905,
253
- "f":0.9853124222
254
  },
255
  "mark":{
256
- "p":0.9064065384,
257
- "r":0.9109697933,
258
- "f":0.9086824369
259
  },
260
  "intj":{
261
- "p":0.6742364918,
262
- "r":0.6307692308,
263
- "f":0.6517789553
264
  },
265
  "advcl":{
266
- "p":0.6793032787,
267
- "r":0.6678418534,
268
- "f":0.6735238095
269
  },
270
  "cc":{
271
- "p":0.8407122233,
272
- "r":0.8357851932,
273
- "f":0.8382414682
274
  },
275
  "neg":{
276
  "p":0.9431988042,
@@ -278,222 +281,222 @@
278
  "f":0.9465
279
  },
280
  "conj":{
281
- "p":0.7720826076,
282
- "r":0.7812185297,
283
- "f":0.7766237017
284
  },
285
  "nsubjpass":{
286
- "p":0.9211997966,
287
- "r":0.9292307692,
288
- "f":0.9251978555
289
  },
290
  "auxpass":{
291
- "p":0.9465311533,
292
- "r":0.9758542141,
293
- "f":0.9609690444
294
  },
295
  "dobj":{
296
- "p":0.9266134085,
297
- "r":0.9427842856,
298
- "f":0.9346289055
299
  },
300
  "nummod":{
301
- "p":0.9384693618,
302
- "r":0.9320707071,
303
- "f":0.9352590903
304
  },
305
  "npadvmod":{
306
- "p":0.7770219199,
307
- "r":0.7303730018,
308
- "f":0.7529756455
309
  },
310
  "prt":{
311
- "p":0.8134206219,
312
- "r":0.8906810036,
313
- "f":0.8502994012
314
  },
315
  "pcomp":{
316
- "p":0.8900785153,
317
- "r":0.8732492997,
318
- "f":0.8815835984
319
  },
320
  "expl":{
321
- "p":0.9809725159,
322
- "r":0.9935760171,
323
- "f":0.9872340426
324
  },
325
  "acl":{
326
- "p":0.7492762015,
327
- "r":0.7059465357,
328
- "f":0.7269662921
329
  },
330
  "agent":{
331
- "p":0.8900169205,
332
- "r":0.9426523297,
333
- "f":0.9155787641
334
  },
335
  "dative":{
336
- "p":0.8016085791,
337
- "r":0.6857798165,
338
- "f":0.739184178
339
  },
340
  "acomp":{
341
- "p":0.9135460009,
342
- "r":0.8961451247,
343
- "f":0.9047619048
344
  },
345
  "dep":{
346
- "p":0.3758389262,
347
- "r":0.1818181818,
348
- "f":0.2450765864
349
  },
350
  "csubj":{
351
- "p":0.7878787879,
352
- "r":0.7692307692,
353
- "f":0.7784431138
354
  },
355
  "quantmod":{
356
- "p":0.8629893238,
357
- "r":0.7879772543,
358
- "f":0.8237791932
359
  },
360
  "nmod":{
361
- "p":0.7400150716,
362
- "r":0.5984156002,
363
- "f":0.6617250674
364
  },
365
  "appos":{
366
- "p":0.702283105,
367
- "r":0.6672451193,
368
- "f":0.6843159066
369
  },
370
  "predet":{
371
- "p":0.84,
372
  "r":0.9012875536,
373
- "f":0.8695652174
374
  },
375
  "preconj":{
376
- "p":0.3617021277,
377
- "r":0.5930232558,
378
- "f":0.449339207
379
  },
380
  "oprd":{
381
- "p":0.8333333333,
382
- "r":0.7462686567,
383
- "f":0.7874015748
384
  },
385
  "parataxis":{
386
- "p":0.6051948052,
387
- "r":0.5054229935,
388
- "f":0.5508274232
389
  },
390
  "meta":{
391
- "p":0.78125,
392
- "r":0.4807692308,
393
- "f":0.5952380952
394
  },
395
  "csubjpass":{
396
- "p":0.5555555556,
397
- "r":0.8333333333,
398
- "f":0.6666666667
399
  }
400
  },
401
- "ents_p":0.8494302632,
402
- "ents_r":0.8549178686,
403
- "ents_f":0.8521652315,
404
  "ents_per_type":{
405
  "DATE":{
406
- "p":0.8584701146,
407
- "r":0.88,
408
- "f":0.8691017401
409
  },
410
  "GPE":{
411
- "p":0.9209341587,
412
- "r":0.9129707113,
413
- "f":0.916935145
414
  },
415
  "ORDINAL":{
416
- "p":0.7768595041,
417
- "r":0.8757763975,
418
- "f":0.8233576642
 
 
 
 
 
419
  },
420
  "ORG":{
421
- "p":0.8124188101,
422
- "r":0.8290031813,
423
- "f":0.8206272143
424
  },
425
  "QUANTITY":{
426
- "p":0.8053691275,
427
- "r":0.6593406593,
428
- "f":0.7250755287
429
  },
430
  "CARDINAL":{
431
- "p":0.8215281651,
432
- "r":0.8757431629,
433
- "f":0.8477697842
434
- },
435
- "FAC":{
436
- "p":0.425,
437
- "r":0.3923076923,
438
- "f":0.408
439
- },
440
- "PERSON":{
441
- "p":0.8683001531,
442
- "r":0.9252610966,
443
- "f":0.8958761258
444
  },
445
  "NORP":{
446
- "p":0.8922716628,
447
- "r":0.9144,
448
- "f":0.9032003161
449
  },
450
  "LOC":{
451
- "p":0.7168458781,
452
- "r":0.6369426752,
453
- "f":0.6745362563
 
 
 
 
 
454
  },
455
  "TIME":{
456
- "p":0.7065527066,
457
- "r":0.7251461988,
458
- "f":0.7157287157
459
  },
460
- "MONEY":{
461
- "p":0.9112709832,
462
- "r":0.8972845336,
463
- "f":0.9042236764
464
  },
465
- "WORK_OF_ART":{
466
- "p":0.4113475177,
467
- "r":0.2989690722,
468
- "f":0.3462686567
469
  },
470
  "EVENT":{
471
- "p":0.6024096386,
472
- "r":0.2873563218,
473
- "f":0.3891050584
 
 
 
 
 
474
  },
475
  "LAW":{
476
- "p":0.5737704918,
477
- "r":0.546875,
478
- "f":0.56
479
  },
480
  "PERCENT":{
481
- "p":0.9020537125,
482
- "r":0.8744257274,
483
- "f":0.8880248834
484
  },
485
  "LANGUAGE":{
486
- "p":0.7083333333,
487
- "r":0.53125,
488
- "f":0.6071428571
489
- },
490
- "PRODUCT":{
491
- "p":0.6363636364,
492
- "r":0.2654028436,
493
- "f":0.3745819398
494
  }
495
  },
496
- "speed":9607.0019342563
497
  },
498
  "sources":[
499
  {
@@ -515,10 +518,10 @@
515
  "license":"WordNet 3.0 License"
516
  },
517
  {
518
- "name":"Explosion Vectors (OSCAR 2109 + Wikipedia + OpenSubtitles + WMT News Crawl)",
519
- "url":"https://github.com/explosion/spacy-vectors-builder",
520
- "license":"CC0",
521
- "author":"Explosion"
522
  }
523
  ],
524
  "requirements":[
 
1
  {
2
  "lang":"en",
3
  "name":"core_web_md",
4
+ "version":"3.2.0",
5
  "description":"English pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"MIT",
10
+ "spacy_version":">=3.2.0,<3.3.0",
11
+ "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":300,
14
  "vectors":20000,
15
+ "keys":684830,
16
  "name":"en_vectors"
17
  },
18
  "labels":{
 
68
  "WP$",
69
  "WRB",
70
  "XX",
 
71
  "``"
72
  ],
73
  "parser":[
 
117
  "relcl",
118
  "xcomp"
119
  ],
120
+ "senter":[
121
+ "I",
122
+ "S"
123
+ ],
124
  "attribute_ruler":[
125
 
126
  ],
 
169
  "senter"
170
  ],
171
  "performance":{
172
+ "token_acc":0.9993053983,
173
+ "token_p":0.9956742163,
174
+ "token_r":0.9957505887,
175
+ "token_f":0.9957124011,
176
+ "tag_acc":0.9736958159,
177
+ "sents_p":0.9144345238,
178
+ "sents_r":0.8918134442,
179
+ "sents_f":0.9029823331,
180
+ "dep_uas":0.9186827918,
181
+ "dep_las":0.9006556195,
182
  "dep_las_per_type":{
183
  "prep":{
184
+ "p":0.8569122175,
185
+ "r":0.8659836843,
186
+ "f":0.8614240691
187
  },
188
  "det":{
189
+ "p":0.9770765472,
190
+ "r":0.9784310528,
191
+ "f":0.9777533309
192
  },
193
  "pobj":{
194
+ "p":0.9611128429,
195
+ "r":0.968623601,
196
+ "f":0.9648536056
197
  },
198
  "nsubj":{
199
+ "p":0.9594312375,
200
+ "r":0.9459802848,
201
+ "f":0.9526582837
202
  },
203
  "aux":{
204
+ "p":0.9797621161,
205
+ "r":0.9826404344,
206
+ "f":0.9811991644
207
  },
208
  "advmod":{
209
+ "p":0.8561672709,
210
+ "r":0.8543664816,
211
+ "f":0.8552659283
212
  },
213
  "relcl":{
214
+ "p":0.765480427,
215
+ "r":0.780478955,
216
+ "f":0.772906935
217
  },
218
  "root":{
219
+ "p":0.9166215118,
220
+ "r":0.8927369879,
221
+ "f":0.9045216055
222
  },
223
  "xcomp":{
224
+ "p":0.8828097423,
225
+ "r":0.8977027997,
226
+ "f":0.8901939847
227
  },
228
  "amod":{
229
+ "p":0.92090506,
230
+ "r":0.9149983803,
231
+ "f":0.9179422183
232
  },
233
  "compound":{
234
+ "p":0.917950968,
235
+ "r":0.9321118289,
236
+ "f":0.924977203
237
  },
238
  "poss":{
239
+ "p":0.9744877461,
240
+ "r":0.9764492754,
241
+ "f":0.9754675246
242
  },
243
  "ccomp":{
244
+ "p":0.7754030746,
245
+ "r":0.8423625255,
246
+ "f":0.8074970715
247
  },
248
  "attr":{
249
+ "p":0.8974979822,
250
+ "r":0.9352396972,
251
+ "f":0.9159802306
252
  },
253
  "case":{
254
+ "p":0.9811881188,
255
+ "r":0.991991992,
256
+ "f":0.9865604778
257
  },
258
  "mark":{
259
+ "p":0.9043686734,
260
+ "r":0.8995760466,
261
+ "f":0.9019659936
262
  },
263
  "intj":{
264
+ "p":0.6650717703,
265
+ "r":0.610989011,
266
+ "f":0.636884307
267
  },
268
  "advcl":{
269
+ "p":0.6723033564,
270
+ "r":0.6607907328,
271
+ "f":0.666497333
272
  },
273
  "cc":{
274
+ "p":0.835978836,
275
+ "r":0.8314794881,
276
+ "f":0.8337230917
277
  },
278
  "neg":{
279
  "p":0.9431988042,
 
281
  "f":0.9465
282
  },
283
  "conj":{
284
+ "p":0.7615497433,
285
+ "r":0.7843655589,
286
+ "f":0.7727892844
287
  },
288
  "nsubjpass":{
289
+ "p":0.9269311065,
290
+ "r":0.9107692308,
291
+ "f":0.9187790998
292
  },
293
  "auxpass":{
294
+ "p":0.9508050089,
295
+ "r":0.9685649203,
296
+ "f":0.9596027985
297
  },
298
  "dobj":{
299
+ "p":0.9220839813,
300
+ "r":0.9449358515,
301
+ "f":0.9333700657
302
  },
303
  "nummod":{
304
+ "p":0.9399338254,
305
+ "r":0.9325757576,
306
+ "f":0.9362403346
307
  },
308
  "npadvmod":{
309
+ "p":0.7793445122,
310
+ "r":0.7264653641,
311
+ "f":0.7519764663
312
  },
313
  "prt":{
314
+ "p":0.8145094806,
315
+ "r":0.8853046595,
316
+ "f":0.8484328038
317
  },
318
  "pcomp":{
319
+ "p":0.8889679715,
320
+ "r":0.8746498599,
321
+ "f":0.8817507942
322
  },
323
  "expl":{
324
+ "p":0.983014862,
325
+ "r":0.9914346895,
326
+ "f":0.987206823
327
  },
328
  "acl":{
329
+ "p":0.7449741528,
330
+ "r":0.7075831969,
331
+ "f":0.7257974259
332
  },
333
  "agent":{
334
+ "p":0.8957264957,
335
+ "r":0.9390681004,
336
+ "f":0.9168853893
337
  },
338
  "dative":{
339
+ "p":0.7732997481,
340
+ "r":0.7041284404,
341
+ "f":0.7370948379
342
  },
343
  "acomp":{
344
+ "p":0.9094236048,
345
+ "r":0.9015873016,
346
+ "f":0.9054884992
347
  },
348
  "dep":{
349
+ "p":0.3909465021,
350
+ "r":0.1542207792,
351
+ "f":0.2211874272
352
  },
353
  "csubj":{
354
+ "p":0.8098591549,
355
+ "r":0.6804733728,
356
+ "f":0.7395498392
357
  },
358
  "quantmod":{
359
+ "p":0.8739800544,
360
+ "r":0.7831031682,
361
+ "f":0.8260497001
362
  },
363
  "nmod":{
364
+ "p":0.7614457831,
365
+ "r":0.5776965265,
366
+ "f":0.656964657
367
  },
368
  "appos":{
369
+ "p":0.6850678733,
370
+ "r":0.6568329718,
371
+ "f":0.6706533776
372
  },
373
  "predet":{
374
+ "p":0.8467741935,
375
  "r":0.9012875536,
376
+ "f":0.8731808732
377
  },
378
  "preconj":{
379
+ "p":0.5454545455,
380
+ "r":0.6279069767,
381
+ "f":0.5837837838
382
  },
383
  "oprd":{
384
+ "p":0.8413793103,
385
+ "r":0.728358209,
386
+ "f":0.7808
387
  },
388
  "parataxis":{
389
+ "p":0.6129943503,
390
+ "r":0.4707158351,
391
+ "f":0.5325153374
392
  },
393
  "meta":{
394
+ "p":0.8,
395
+ "r":0.3076923077,
396
+ "f":0.4444444444
397
  },
398
  "csubjpass":{
399
+ "p":0.5714285714,
400
+ "r":0.6666666667,
401
+ "f":0.6153846154
402
  }
403
  },
404
+ "ents_p":0.8531330602,
405
+ "ents_r":0.8448016827,
406
+ "ents_f":0.8489469314,
407
  "ents_per_type":{
408
  "DATE":{
409
+ "p":0.8645998102,
410
+ "r":0.8676190476,
411
+ "f":0.8661067977
412
  },
413
  "GPE":{
414
+ "p":0.9183846371,
415
+ "r":0.9071129707,
416
+ "f":0.9127140051
417
  },
418
  "ORDINAL":{
419
+ "p":0.7765363128,
420
+ "r":0.8633540373,
421
+ "f":0.8176470588
422
+ },
423
+ "PERSON":{
424
+ "p":0.8805737449,
425
+ "r":0.9216710183,
426
+ "f":0.9006538032
427
  },
428
  "ORG":{
429
+ "p":0.8025329543,
430
+ "r":0.8231707317,
431
+ "f":0.8127208481
432
  },
433
  "QUANTITY":{
434
+ "p":0.7697841727,
435
+ "r":0.5879120879,
436
+ "f":0.6666666667
437
  },
438
  "CARDINAL":{
439
+ "p":0.8279202279,
440
+ "r":0.8638525565,
441
+ "f":0.8455048007
 
 
 
 
 
 
 
 
 
 
442
  },
443
  "NORP":{
444
+ "p":0.9102667745,
445
+ "r":0.9008,
446
+ "f":0.905508645
447
  },
448
  "LOC":{
449
+ "p":0.7022058824,
450
+ "r":0.6082802548,
451
+ "f":0.6518771331
452
+ },
453
+ "FAC":{
454
+ "p":0.4122807018,
455
+ "r":0.3615384615,
456
+ "f":0.3852459016
457
  },
458
  "TIME":{
459
+ "p":0.7450980392,
460
+ "r":0.6666666667,
461
+ "f":0.7037037037
462
  },
463
+ "PRODUCT":{
464
+ "p":0.6376811594,
465
+ "r":0.2085308057,
466
+ "f":0.3142857143
467
  },
468
+ "MONEY":{
469
+ "p":0.9027611044,
470
+ "r":0.8878394333,
471
+ "f":0.8952380952
472
  },
473
  "EVENT":{
474
+ "p":0.6043956044,
475
+ "r":0.316091954,
476
+ "f":0.4150943396
477
+ },
478
+ "WORK_OF_ART":{
479
+ "p":0.5317460317,
480
+ "r":0.3453608247,
481
+ "f":0.41875
482
  },
483
  "LAW":{
484
+ "p":0.4666666667,
485
+ "r":0.328125,
486
+ "f":0.3853211009
487
  },
488
  "PERCENT":{
489
+ "p":0.9090909091,
490
+ "r":0.8728943338,
491
+ "f":0.890625
492
  },
493
  "LANGUAGE":{
494
+ "p":0.6956521739,
495
+ "r":0.5,
496
+ "f":0.5818181818
 
 
 
 
 
497
  }
498
  },
499
+ "speed":7620.1455610511
500
  },
501
  "sources":[
502
  {
 
518
  "license":"WordNet 3.0 License"
519
  },
520
  {
521
+ "name":"GloVe Common Crawl",
522
+ "url":"https://nlp.stanford.edu/projects/glove/",
523
+ "license":"Public Domain Dedication and License v1.0",
524
+ "author":"Jeffrey Pennington, Richard Socher, and Christopher D. Manning"
525
  }
526
  ],
527
  "requirements":[
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ebc6013eea110f8f48328010672e730cdd691a61e46e5b604df150b6b3a771ec
3
- size 6380943
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cd7f9b1abdedfc181c608ba07ca44304f51c3d09e95e15f383c7cdc0ae3d4af
3
+ size 7106353
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves�{"0":{},"1":{"ORG":56516,"DATE":40493,"PERSON":36534,"GPE":26745,"MONEY":15158,"CARDINAL":14109,"NORP":9641,"PERCENT":9199,"WORK_OF_ART":4488,"LOC":4055,"TIME":3678,"QUANTITY":3123,"FAC":3046,"EVENT":3021,"ORDINAL":2142,"PRODUCT":1787,"LAW":1624,"LANGUAGE":355},"2":{"ORG":56516,"DATE":40493,"PERSON":36534,"GPE":26745,"MONEY":15158,"CARDINAL":14109,"NORP":9641,"PERCENT":9199,"WORK_OF_ART":4488,"LOC":4055,"TIME":3678,"QUANTITY":3123,"FAC":3046,"EVENT":3021,"ORDINAL":2142,"PRODUCT":1787,"LAW":1624,"LANGUAGE":355},"3":{"ORG":56516,"DATE":40493,"PERSON":36534,"GPE":26745,"MONEY":15158,"CARDINAL":14109,"NORP":9641,"PERCENT":9199,"WORK_OF_ART":4488,"LOC":4055,"TIME":3678,"QUANTITY":3123,"FAC":3046,"EVENT":3021,"ORDINAL":2142,"PRODUCT":1787,"LAW":1624,"LANGUAGE":355},"4":{"ORG":56516,"DATE":40493,"PERSON":36534,"GPE":26745,"MONEY":15158,"CARDINAL":14109,"NORP":9641,"PERCENT":9199,"WORK_OF_ART":4488,"LOC":4055,"TIME":3678,"QUANTITY":3123,"FAC":3046,"EVENT":3021,"ORDINAL":2142,"PRODUCT":1787,"LAW":1624,"LANGUAGE":355,"":1},"5":{"":1}}�cfg��neg_key�
 
1
+ ��moves�{"0":{},"1":{"ORG":56356,"DATE":40381,"PERSON":36475,"GPE":26716,"MONEY":15121,"CARDINAL":14096,"NORP":9638,"PERCENT":9182,"WORK_OF_ART":4475,"LOC":4047,"TIME":3670,"QUANTITY":3114,"FAC":3042,"EVENT":3015,"ORDINAL":2142,"PRODUCT":1782,"LAW":1620,"LANGUAGE":355},"2":{"ORG":56356,"DATE":40381,"PERSON":36475,"GPE":26716,"MONEY":15121,"CARDINAL":14096,"NORP":9638,"PERCENT":9182,"WORK_OF_ART":4475,"LOC":4047,"TIME":3670,"QUANTITY":3114,"FAC":3042,"EVENT":3015,"ORDINAL":2142,"PRODUCT":1782,"LAW":1620,"LANGUAGE":355},"3":{"ORG":56356,"DATE":40381,"PERSON":36475,"GPE":26716,"MONEY":15121,"CARDINAL":14096,"NORP":9638,"PERCENT":9182,"WORK_OF_ART":4475,"LOC":4047,"TIME":3670,"QUANTITY":3114,"FAC":3042,"EVENT":3015,"ORDINAL":2142,"PRODUCT":1782,"LAW":1620,"LANGUAGE":355},"4":{"ORG":56356,"DATE":40381,"PERSON":36475,"GPE":26716,"MONEY":15121,"CARDINAL":14096,"NORP":9638,"PERCENT":9182,"WORK_OF_ART":4475,"LOC":4047,"TIME":3670,"QUANTITY":3114,"FAC":3042,"EVENT":3015,"ORDINAL":2142,"PRODUCT":1782,"LAW":1620,"LANGUAGE":355,"":1},"5":{"":1}}�cfg��neg_key�
parser/model CHANGED
Binary files a/parser/model and b/parser/model differ
 
parser/moves CHANGED
@@ -1 +1,2 @@
1
- ��moves� {"0":{"":994332},"1":{"":999432},"2":{"det":172595,"nsubj":165748,"compound":116623,"amod":105184,"aux":86667,"punct":65478,"advmod":62763,"poss":36443,"mark":27941,"nummod":22598,"auxpass":15594,"prep":14001,"nsubjpass":13856,"neg":12357,"cc":10739,"nmod":9562,"advcl":9062,"npadvmod":8168,"quantmod":7101,"intj":6464,"ccomp":5896,"dobj":3427,"expl":3360,"dep":2871,"predet":1944,"parataxis":1837,"csubj":1428,"preconj":621,"pobj||prep":616,"attr":578,"meta":376,"advmod||conj":368,"dobj||xcomp":352,"acomp":284,"nsubj||ccomp":224,"dative":206,"advmod||xcomp":149,"dobj||ccomp":70,"csubjpass":64,"dobj||conj":62,"prep||conj":51,"acl":48,"prep||nsubj":41,"prep||dobj":36,"xcomp":34,"advmod||ccomp":32,"oprd":31},"3":{"punct":183790,"pobj":182191,"prep":174008,"dobj":89615,"conj":59687,"cc":51930,"ccomp":30385,"advmod":22861,"xcomp":21021,"relcl":20969,"advcl":19828,"attr":17741,"acomp":16922,"appos":15265,"case":13388,"acl":12085,"pcomp":10324,"dep":10116,"npadvmod":9796,"prt":8179,"agent":3903,"dative":3866,"nsubj":3470,"neg":2906,"amod":2839,"intj":2819,"nummod":2732,"oprd":2301,"parataxis":1261,"quantmod":319,"nmod":294,"acl||dobj":200,"prep||dobj":190,"prep||nsubj":162,"acl||nsubj":159,"appos||nsubj":145,"relcl||dobj":134,"relcl||nsubj":111,"aux":103,"expl":96,"meta":92,"appos||dobj":86,"preconj":71,"csubj":65,"prep||nsubjpass":55,"prep||advmod":54,"prep||acomp":53,"det":51,"nsubjpass":45,"relcl||pobj":42,"acl||nsubjpass":42,"mark":40,"auxpass":39,"prep||pobj":36,"relcl||nsubjpass":32,"appos||nsubjpass":31},"4":{"ROOT":111664}}�cfg��neg_key�
 
 
1
+ ��moves�
2
+ {"0":{"":995932},"1":{"":989662},"2":{"det":172430,"nsubj":165679,"compound":116803,"amod":106128,"aux":87078,"punct":65505,"advmod":62711,"poss":36427,"mark":27913,"nummod":22583,"auxpass":15597,"prep":13989,"nsubjpass":13867,"neg":12358,"cc":10694,"nmod":9572,"advcl":9063,"npadvmod":8135,"quantmod":7071,"intj":6557,"ccomp":5899,"dobj":3427,"expl":3360,"dep":3191,"predet":1945,"parataxis":1826,"csubj":1431,"preconj":620,"pobj||prep":615,"attr":578,"meta":448,"advmod||conj":367,"dobj||xcomp":352,"acomp":284,"nsubj||ccomp":224,"dative":206,"advmod||xcomp":149,"dobj||ccomp":70,"csubjpass":64,"dobj||conj":62,"prep||conj":51,"acl":48,"prep||nsubj":41,"prep||dobj":36,"xcomp":34,"advmod||ccomp":32,"oprd":31},"3":{"punct":183437,"pobj":182256,"prep":173845,"dobj":89650,"conj":59689,"cc":51858,"ccomp":30404,"advmod":22820,"xcomp":21045,"relcl":20968,"advcl":19833,"attr":17739,"acomp":16824,"appos":14963,"case":13361,"acl":12091,"pcomp":10345,"npadvmod":9702,"prt":8179,"agent":3884,"dative":3867,"nsubj":3465,"intj":2898,"neg":2871,"amod":2843,"nummod":2510,"oprd":2304,"dep":1518,"parataxis":1261,"quantmod":317,"nmod":296,"acl||dobj":202,"prep||dobj":190,"prep||nsubj":162,"acl||nsubj":159,"appos||nsubj":145,"relcl||dobj":134,"relcl||nsubj":111,"aux":103,"expl":96,"meta":93,"appos||dobj":86,"preconj":71,"csubj":65,"prep||nsubjpass":55,"prep||advmod":54,"prep||acomp":53,"det":51,"nsubjpass":45,"acl||nsubjpass":42,"relcl||pobj":41,"mark":40,"auxpass":39,"prep||pobj":36,"relcl||nsubjpass":32,"appos||nsubjpass":31},"4":{"ROOT":110979}}�cfg��neg_key�
senter/model CHANGED
Binary files a/senter/model and b/senter/model differ
 
tagger/cfg CHANGED
@@ -48,9 +48,7 @@
48
  "WP$",
49
  "WRB",
50
  "XX",
51
- "_SP",
52
  "``"
53
  ],
54
- "neg_prefix":"!",
55
  "overwrite":false
56
  }
 
48
  "WP$",
49
  "WRB",
50
  "XX",
 
51
  "``"
52
  ],
 
53
  "overwrite":false
54
  }
tagger/model CHANGED
Binary files a/tagger/model and b/tagger/model differ
 
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f6e27d50821c4144fea27377b8468b4d7e80051f355c52b9f695dd5fbc797241
3
- size 6495793
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecc6afa27b4c6945e9d785433a4bf3e39c6b132dd4e4ebc95c121a3c66108c5d
3
+ size 6960804
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
 
vocab/key2row CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:389912f67e81a52fbabb7edf8e36a0c3700b0b20d6dc6ef71bd56eb91ba08a0a
3
- size 6165224
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72ef5a3f9d3d99e7eff1a2d7ec68531f494fba90aee4bbed3466a53aa28d8ebc
3
+ size 8216954
vocab/strings.json CHANGED
The diff for this file is too large to render. See raw diff
 
vocab/vectors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a7e833278b382d25fed830bb17531356fe739fc6d3a223bdea62c21e5a6d220a
3
  size 24000128
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7bf8c73207a783fe1831b85d8726662ac817ec3436fc142185e2cd27cf8e2730
3
  size 24000128