osanseviero HF staff commited on
Commit
bc4a187
1 Parent(s): 60867fb

Update spaCy pipeline

Browse files
LICENSES_SOURCES CHANGED
@@ -1,4 +1,4 @@
1
- # UD Greek GDT v2.5
2
 
3
  * Author: Prokopidis, Prokopis
4
  * URL: https://github.com/UniversalDependencies/UD_Greek-GDT
1
+ # UD Greek GDT v2.8
2
 
3
  * Author: Prokopidis, Prokopis
4
  * URL: https://github.com/UniversalDependencies/UD_Greek-GDT
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - token-classification
5
  language:
6
  - el
7
- license: cc-by-nc-sa-4.0
8
  model-index:
9
  - name: el_core_news_md
10
  results:
@@ -14,47 +14,47 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.7975206612
18
  - name: NER Recall
19
  type: recall
20
- value: 0.8109243697
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.8041666667
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
  - name: POS Accuracy
29
  type: accuracy
30
- value: 0.9331952867
31
  - task:
32
  name: SENTER
33
  type: token-classification
34
  metrics:
35
  - name: SENTER Precision
36
  type: precision
37
- value: 0.935483871
38
  - name: SENTER Recall
39
  type: recall
40
- value: 0.935483871
41
  - name: SENTER F Score
42
  type: f_score
43
- value: 0.935483871
44
  - task:
45
  name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
  - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.8805625755
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
  - name: Labeled Dependencies Accuracy
56
  type: accuracy
57
- value: 0.8805625755
58
  ---
59
  ### Details: https://spacy.io/models/el#el_core_news_md
60
 
@@ -63,12 +63,12 @@ Greek pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, se
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `el_core_news_md` |
66
- | **Version** | `3.1.0` |
67
- | **spaCy** | `>=3.1.0,<3.2.0` |
68
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 500000 keys, 20000 unique vectors (300 dimensions) |
71
- | **Sources** | [UD Greek GDT v2.5](https://github.com/UniversalDependencies/UD_Greek-GDT) (Prokopidis, Prokopis)<br />[Greek NER Corpus (Google Summer of Code 2018)](https://github.com/eellak/gsoc2018-spacy) (Giannis Daras)<br />[spaCy lookups data](https://github.com/explosion/spacy-lookups-data) (Explosion)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
72
  | **License** | `CC BY-NC-SA 3.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
@@ -92,15 +92,21 @@ Greek pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, se
92
  | Type | Score |
93
  | --- | --- |
94
  | `TOKEN_ACC` | 100.00 |
95
- | `TAG_ACC` | 93.32 |
96
- | `POS_ACC` | 96.32 |
97
- | `MORPH_ACC` | 90.78 |
98
- | `LEMMA_ACC` | 56.48 |
99
- | `DEP_UAS` | 88.06 |
100
- | `DEP_LAS` | 84.33 |
101
- | `SENTS_P` | 93.55 |
102
- | `SENTS_R` | 93.55 |
103
- | `SENTS_F` | 93.55 |
104
- | `ENTS_P` | 79.75 |
105
- | `ENTS_R` | 81.09 |
106
- | `ENTS_F` | 80.42 |
 
 
 
 
 
 
4
  - token-classification
5
  language:
6
  - el
7
+ license: cc-by-nc-sa-3.0
8
  model-index:
9
  - name: el_core_news_md
10
  results:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.7754237288
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.768907563
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.7721518987
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
  - name: POS Accuracy
29
  type: accuracy
30
+ value: 0.9309273776
31
  - task:
32
  name: SENTER
33
  type: token-classification
34
  metrics:
35
  - name: SENTER Precision
36
  type: precision
37
+ value: 0.9268292683
38
  - name: SENTER Recall
39
  type: recall
40
+ value: 0.9429280397
41
  - name: SENTER F Score
42
  type: f_score
43
+ value: 0.9348093481
44
  - task:
45
  name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
  - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
+ value: 0.8800087946
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
  - name: Labeled Dependencies Accuracy
56
  type: accuracy
57
+ value: 0.8800087946
58
  ---
59
  ### Details: https://spacy.io/models/el#el_core_news_md
60
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `el_core_news_md` |
66
+ | **Version** | `3.2.0` |
67
+ | **spaCy** | `>=3.2.0,<3.3.0` |
68
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 500000 keys, 20000 unique vectors (300 dimensions) |
71
+ | **Sources** | [UD Greek GDT v2.8](https://github.com/UniversalDependencies/UD_Greek-GDT) (Prokopidis, Prokopis)<br />[Greek NER Corpus (Google Summer of Code 2018)](https://github.com/eellak/gsoc2018-spacy) (Giannis Daras)<br />[spaCy lookups data](https://github.com/explosion/spacy-lookups-data) (Explosion)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
72
  | **License** | `CC BY-NC-SA 3.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
92
  | Type | Score |
93
  | --- | --- |
94
  | `TOKEN_ACC` | 100.00 |
95
+ | `TOKEN_P` | 99.90 |
96
+ | `TOKEN_R` | 99.95 |
97
+ | `TOKEN_F` | 99.93 |
98
+ | `POS_ACC` | 96.09 |
99
+ | `MORPH_ACC` | 90.83 |
100
+ | `MORPH_MICRO_P` | 96.07 |
101
+ | `MORPH_MICRO_R` | 96.01 |
102
+ | `MORPH_MICRO_F` | 96.04 |
103
+ | `SENTS_P` | 92.68 |
104
+ | `SENTS_R` | 94.29 |
105
+ | `SENTS_F` | 93.48 |
106
+ | `DEP_UAS` | 88.00 |
107
+ | `DEP_LAS` | 84.34 |
108
+ | `TAG_ACC` | 93.09 |
109
+ | `LEMMA_ACC` | 56.46 |
110
+ | `ENTS_P` | 77.54 |
111
+ | `ENTS_R` | 76.89 |
112
+ | `ENTS_F` | 77.22 |
accuracy.json CHANGED
@@ -1,296 +1,130 @@
1
  {
2
  "token_acc": 1.0,
3
- "tag_acc": 0.9331952867,
4
- "pos_acc": 0.9631711285,
5
- "morph_acc": 0.907755263,
6
- "lemma_acc": 0.5648079673,
7
- "dep_uas": 0.8805625755,
8
- "dep_las": 0.8433139215,
9
- "sents_p": 0.935483871,
10
- "sents_r": 0.935483871,
11
- "sents_f": 0.935483871,
12
- "speed": 2641.746560651,
13
  "morph_per_feat": {
14
  "Abbr": {
15
- "p": 0.9166666667,
16
- "r": 0.8279569892,
17
- "f": 0.8700564972
18
  },
19
  "Case": {
20
- "p": 0.9353283458,
21
- "r": 0.9379793264,
22
- "f": 0.9366519604
23
  },
24
  "Gender": {
25
- "p": 0.9369908562,
26
- "r": 0.9396465488,
27
- "f": 0.9383168234
28
  },
29
  "Number": {
30
- "p": 0.9769784173,
31
  "r": 0.9800808314,
32
- "f": 0.9785271653
33
  },
34
  "Aspect": {
35
- "p": 0.9473684211,
36
- "r": 0.9397590361,
37
- "f": 0.9435483871
38
  },
39
  "Mood": {
40
- "p": 0.9893048128,
41
  "r": 0.9946236559,
42
- "f": 0.9919571046
43
  },
44
  "Person": {
45
- "p": 0.9765567766,
46
- "r": 0.9765567766,
47
- "f": 0.9765567766
48
  },
49
  "Tense": {
50
- "p": 0.9726918075,
51
- "r": 0.9752281617,
52
- "f": 0.9739583333
53
  },
54
  "VerbForm": {
55
- "p": 0.9827935223,
56
- "r": 0.9748995984,
57
- "f": 0.9788306452
58
  },
59
  "Voice": {
60
- "p": 0.967611336,
61
- "r": 0.9598393574,
62
- "f": 0.9637096774
63
  },
64
  "Definite": {
65
- "p": 0.9914627205,
66
- "r": 0.995997713,
67
- "f": 0.9937250428
68
  },
69
  "PronType": {
70
- "p": 0.9867398262,
71
- "r": 0.9880952381,
72
- "f": 0.987417067
73
  },
74
  "Foreign": {
75
- "p": 0.7555555556,
76
- "r": 0.6335403727,
77
- "f": 0.6891891892
78
  },
79
  "NumType": {
80
- "p": 0.9583333333,
81
- "r": 0.8975609756,
82
- "f": 0.9269521411
83
  },
84
  "Poss": {
85
- "p": 0.9032258065,
86
- "r": 0.9438202247,
87
- "f": 0.9230769231
88
  },
89
  "Degree": {
90
- "p": 0.8275862069,
91
- "r": 0.6315789474,
92
- "f": 0.7164179104
93
  }
94
  },
95
- "dep_las_per_type": {
96
- "root": {
97
- "p": 0.9032258065,
98
- "r": 0.9032258065,
99
- "f": 0.9032258065
100
- },
101
- "nmod": {
102
- "p": 0.79778157,
103
- "r": 0.8032646048,
104
- "f": 0.8005136986
105
- },
106
- "vocative": {
107
- "p": 0.8,
108
- "r": 0.5714285714,
109
- "f": 0.6666666667
110
- },
111
- "cc": {
112
- "p": 0.834375,
113
- "r": 0.8317757009,
114
- "f": 0.8330733229
115
- },
116
- "conj": {
117
- "p": 0.5302197802,
118
- "r": 0.5302197802,
119
- "f": 0.5302197802
120
- },
121
- "aux": {
122
- "p": 0.9776119403,
123
- "r": 0.9632352941,
124
- "f": 0.9703703704
125
- },
126
- "advmod": {
127
- "p": 0.7658402204,
128
- "r": 0.7830985915,
129
- "f": 0.7743732591
130
- },
131
- "ccomp": {
132
- "p": 0.7916666667,
133
- "r": 0.8260869565,
134
- "f": 0.8085106383
135
- },
136
- "det": {
137
- "p": 0.9586243955,
138
- "r": 0.9695652174,
139
- "f": 0.9640637665
140
- },
141
- "obj": {
142
- "p": 0.8357348703,
143
- "r": 0.8814589666,
144
- "f": 0.8579881657
145
- },
146
- "flat": {
147
- "p": 0.6862745098,
148
- "r": 0.7070707071,
149
- "f": 0.6965174129
150
- },
151
- "case": {
152
- "p": 0.958554729,
153
- "r": 0.9636752137,
154
- "f": 0.9611081513
155
- },
156
- "amod": {
157
- "p": 0.9072929543,
158
- "r": 0.890776699,
159
- "f": 0.8989589712
160
- },
161
- "obl": {
162
- "p": 0.786377709,
163
- "r": 0.8,
164
- "f": 0.7931303669
165
- },
166
- "acl:relcl": {
167
- "p": 0.7904191617,
168
- "r": 0.7173913043,
169
- "f": 0.7521367521
170
- },
171
- "mark": {
172
- "p": 0.8888888889,
173
- "r": 0.8951048951,
174
- "f": 0.8919860627
175
- },
176
- "nsubj:pass": {
177
- "p": 0.7677419355,
178
- "r": 0.7212121212,
179
- "f": 0.74375
180
- },
181
- "nsubj": {
182
- "p": 0.7571428571,
183
- "r": 0.7429906542,
184
- "f": 0.75
185
- },
186
- "cop": {
187
- "p": 0.7474747475,
188
- "r": 0.7326732673,
189
- "f": 0.74
190
- },
191
- "parataxis": {
192
- "p": 0.2222222222,
193
- "r": 0.2352941176,
194
- "f": 0.2285714286
195
- },
196
- "nummod": {
197
- "p": 0.9125,
198
- "r": 0.8795180723,
199
- "f": 0.8957055215
200
- },
201
- "advcl": {
202
- "p": 0.4758064516,
203
- "r": 0.5566037736,
204
- "f": 0.5130434783
205
- },
206
- "xcomp": {
207
- "p": 0.7333333333,
208
- "r": 0.6547619048,
209
- "f": 0.6918238994
210
- },
211
- "csubj": {
212
- "p": 0.7142857143,
213
- "r": 0.4545454545,
214
- "f": 0.5555555556
215
- },
216
- "acl": {
217
- "p": 0.6896551724,
218
- "r": 0.4545454545,
219
- "f": 0.5479452055
220
- },
221
- "compound": {
222
- "p": 0.0,
223
- "r": 0.0,
224
- "f": 0.0
225
- },
226
- "appos": {
227
- "p": 0.3111111111,
228
- "r": 0.2857142857,
229
- "f": 0.2978723404
230
- },
231
- "fixed": {
232
- "p": 0.4285714286,
233
- "r": 0.4285714286,
234
- "f": 0.4285714286
235
- },
236
- "csubj:pass": {
237
- "p": 0.8,
238
- "r": 0.6666666667,
239
- "f": 0.7272727273
240
- },
241
- "obl:agent": {
242
- "p": 0.5294117647,
243
- "r": 0.36,
244
- "f": 0.4285714286
245
- },
246
- "dep": {
247
- "p": 0.0,
248
- "r": 0.0,
249
- "f": 0.0
250
- },
251
- "orphan": {
252
- "p": 0.0,
253
- "r": 0.0,
254
- "f": 0.0
255
- },
256
- "iobj": {
257
- "p": 1.0,
258
- "r": 1.0,
259
- "f": 1.0
260
- },
261
- "expl": {
262
- "p": 0.0,
263
- "r": 0.0,
264
- "f": 0.0
265
- }
266
- },
267
- "ents_p": 0.7975206612,
268
- "ents_r": 0.8109243697,
269
- "ents_f": 0.8041666667,
270
  "ents_per_type": {
271
  "PERSON": {
272
- "p": 0.8923076923,
273
- "r": 0.90625,
274
- "f": 0.8992248062
275
  },
276
  "GPE": {
277
- "p": 0.8404255319,
278
  "r": 0.908045977,
279
- "f": 0.8729281768
280
  },
281
  "ORG": {
282
- "p": 0.7042253521,
283
- "r": 0.7042253521,
284
- "f": 0.7042253521
285
  },
286
  "PRODUCT": {
287
- "p": 0.75,
288
- "r": 0.375,
289
- "f": 0.5
290
  },
291
  "EVENT": {
292
- "p": 0.5,
293
- "r": 0.5,
294
  "f": 0.5
295
  },
296
  "LOC": {
@@ -298,5 +132,6 @@
298
  "r": 0.0,
299
  "f": 0.0
300
  }
301
- }
 
302
  }
1
  {
2
  "token_acc": 1.0,
3
+ "token_p": 0.9990295973,
4
+ "token_r": 0.9995068547,
5
+ "token_f": 0.9992604644,
6
+ "pos_acc": 0.9609032194,
7
+ "morph_acc": 0.9083468915,
8
+ "morph_micro_p": 0.9607098022,
9
+ "morph_micro_r": 0.9600583189,
10
+ "morph_micro_f": 0.9603839501,
 
 
11
  "morph_per_feat": {
12
  "Abbr": {
13
+ "p": 0.974025974,
14
+ "r": 0.8064516129,
15
+ "f": 0.8823529412
16
  },
17
  "Case": {
18
+ "p": 0.9366899302,
19
+ "r": 0.9398132711,
20
+ "f": 0.9382490013
21
  },
22
  "Gender": {
23
+ "p": 0.9376869392,
24
+ "r": 0.9408136045,
25
+ "f": 0.9392476698
26
  },
27
  "Number": {
28
+ "p": 0.9772596431,
29
  "r": 0.9800808314,
30
+ "f": 0.9786682041
31
  },
32
  "Aspect": {
33
+ "p": 0.9503546099,
34
+ "r": 0.9417670683,
35
+ "f": 0.9460413515
36
  },
37
  "Mood": {
38
+ "p": 0.9946236559,
39
  "r": 0.9946236559,
40
+ "f": 0.9946236559
41
  },
42
  "Person": {
43
+ "p": 0.9816176471,
44
+ "r": 0.978021978,
45
+ "f": 0.9798165138
46
  },
47
  "Tense": {
48
+ "p": 0.9713541667,
49
+ "r": 0.9726205997,
50
+ "f": 0.9719869707
51
  },
52
  "VerbForm": {
53
+ "p": 0.9868287741,
54
+ "r": 0.9779116466,
55
+ "f": 0.9823499748
56
  },
57
  "Voice": {
58
+ "p": 0.9716312057,
59
+ "r": 0.9628514056,
60
+ "f": 0.9672213817
61
  },
62
  "Definite": {
63
+ "p": 0.9909039227,
64
+ "r": 0.9965694683,
65
+ "f": 0.9937286203
66
  },
67
  "PronType": {
68
+ "p": 0.9840109639,
69
+ "r": 0.9862637363,
70
+ "f": 0.9851360622
71
  },
72
  "Foreign": {
73
+ "p": 0.7769230769,
74
+ "r": 0.6273291925,
75
+ "f": 0.6941580756
76
  },
77
  "NumType": {
78
+ "p": 0.9739583333,
79
+ "r": 0.912195122,
80
+ "f": 0.9420654912
81
  },
82
  "Poss": {
83
+ "p": 0.9139784946,
84
+ "r": 0.9550561798,
85
+ "f": 0.9340659341
86
  },
87
  "Degree": {
88
+ "p": 0.7666666667,
89
+ "r": 0.6052631579,
90
+ "f": 0.6764705882
91
  }
92
  },
93
+ "sents_p": 0.9268292683,
94
+ "sents_r": 0.9429280397,
95
+ "sents_f": 0.9348093481,
96
+ "dep_uas": 0.8800087946,
97
+ "dep_las": 0.8434013082,
98
+ "dep_las_per_type": {},
99
+ "tag_acc": 0.9309273776,
100
+ "lemma_acc": 0.5646107578,
101
+ "ents_p": 0.7754237288,
102
+ "ents_r": 0.768907563,
103
+ "ents_f": 0.7721518987,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  "ents_per_type": {
105
  "PERSON": {
106
+ "p": 0.8888888889,
107
+ "r": 0.875,
108
+ "f": 0.8818897638
109
  },
110
  "GPE": {
111
+ "p": 0.8229166667,
112
  "r": 0.908045977,
113
+ "f": 0.8633879781
114
  },
115
  "ORG": {
116
+ "p": 0.6338028169,
117
+ "r": 0.6338028169,
118
+ "f": 0.6338028169
119
  },
120
  "PRODUCT": {
121
+ "p": 0.25,
122
+ "r": 0.125,
123
+ "f": 0.1666666667
124
  },
125
  "EVENT": {
126
+ "p": 1.0,
127
+ "r": 0.3333333333,
128
  "f": 0.5
129
  },
130
  "LOC": {
132
  "r": 0.0,
133
  "f": 0.0
134
  }
135
+ },
136
+ "speed": 2386.8488947681
137
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
config.cfg CHANGED
@@ -1,10 +1,8 @@
1
  [paths]
2
- train = "corpus/el-dep-news/train.spacy"
3
- dev = "corpus/el-dep-news/dev.spacy"
4
- vectors = "corpus/el_vectors"
5
- raw = null
6
  init_tok2vec = null
7
- vocab_data = null
8
 
9
  [system]
10
  gpu_allocator = null
@@ -24,6 +22,7 @@ tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
24
 
25
  [components.attribute_ruler]
26
  factory = "attribute_ruler"
 
27
  validate = false
28
 
29
  [components.lemmatizer]
@@ -31,9 +30,13 @@ factory = "lemmatizer"
31
  mode = "rule"
32
  model = null
33
  overwrite = false
 
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
 
 
 
37
 
38
  [components.morphologizer.model]
39
  @architectures = "spacy.Tagger.v1"
@@ -48,6 +51,7 @@ upstream = "tok2vec"
48
  factory = "ner"
49
  incorrect_spans_key = null
50
  moves = null
 
51
  update_with_oracle_cut_size = 100
52
 
53
  [components.ner.model]
@@ -65,8 +69,8 @@ nO = null
65
  [components.ner.model.tok2vec.embed]
66
  @architectures = "spacy.MultiHashEmbed.v2"
67
  width = 96
68
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
69
- rows = [5000,2500,2500,2500]
70
  include_static_vectors = true
71
 
72
  [components.ner.model.tok2vec.encode]
@@ -81,6 +85,7 @@ factory = "parser"
81
  learn_tokens = false
82
  min_action_freq = 30
83
  moves = null
 
84
  update_with_oracle_cut_size = 100
85
 
86
  [components.parser.model]
@@ -99,6 +104,8 @@ upstream = "tok2vec"
99
 
100
  [components.senter]
101
  factory = "senter"
 
 
102
 
103
  [components.senter.model]
104
  @architectures = "spacy.Tagger.v1"
@@ -110,8 +117,8 @@ nO = null
110
  [components.senter.model.tok2vec.embed]
111
  @architectures = "spacy.MultiHashEmbed.v2"
112
  width = 16
113
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
114
- rows = [1000,500,500,500]
115
  include_static_vectors = true
116
 
117
  [components.senter.model.tok2vec.encode]
@@ -130,8 +137,8 @@ factory = "tok2vec"
130
  [components.tok2vec.model.embed]
131
  @architectures = "spacy.MultiHashEmbed.v2"
132
  width = ${components.tok2vec.model.encode:width}
133
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
134
- rows = [5000,2500,2500,2500]
135
  include_static_vectors = true
136
 
137
  [components.tok2vec.model.encode]
@@ -145,22 +152,19 @@ maxout_pieces = 3
145
 
146
  [corpora.dev]
147
  @readers = "spacy.Corpus.v1"
148
- limit = 0
149
- max_length = 0
150
- path = ${paths:dev}
151
  gold_preproc = false
 
 
152
  augmenter = null
153
 
154
  [corpora.train]
155
  @readers = "spacy.Corpus.v1"
156
- path = ${paths:train}
157
- max_length = 5000
158
  gold_preproc = false
 
159
  limit = 0
160
-
161
- [corpora.train.augmenter]
162
- @augmenters = "spacy.lower_case.v1"
163
- level = 0.1
164
 
165
  [training]
166
  train_corpus = "corpora.train"
@@ -191,9 +195,8 @@ compound = 1.001
191
  t = 0.0
192
 
193
  [training.logger]
194
- @loggers = "spacy.WandbLogger.v1"
195
- project_name = "spacy-v3.0.0a2"
196
- remove_config_values = []
197
 
198
  [training.optimizer]
199
  @optimizers = "Adam.v1"
@@ -216,16 +219,17 @@ dep_las_per_type = null
216
  sents_p = null
217
  sents_r = null
218
  sents_f = 0.02
219
- lemma_acc = 0.33
220
- ents_f = 0.33
221
  ents_p = 0.0
222
  ents_r = 0.0
223
  ents_per_type = null
 
224
 
225
  [pretraining]
226
 
227
  [initialize]
228
- vocab_data = ${paths.vocab_data}
229
  vectors = ${paths.vectors}
230
  init_tok2vec = ${paths.init_tok2vec}
231
  before_init = null
1
  [paths]
2
+ train = null
3
+ dev = null
4
+ vectors = null
 
5
  init_tok2vec = null
 
6
 
7
  [system]
8
  gpu_allocator = null
22
 
23
  [components.attribute_ruler]
24
  factory = "attribute_ruler"
25
+ scorer = {"@scorers":"spacy.attribute_ruler_scorer.v1"}
26
  validate = false
27
 
28
  [components.lemmatizer]
30
  mode = "rule"
31
  model = null
32
  overwrite = false
33
+ scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
37
+ extend = false
38
+ overwrite = true
39
+ scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
40
 
41
  [components.morphologizer.model]
42
  @architectures = "spacy.Tagger.v1"
51
  factory = "ner"
52
  incorrect_spans_key = null
53
  moves = null
54
+ scorer = {"@scorers":"spacy.ner_scorer.v1"}
55
  update_with_oracle_cut_size = 100
56
 
57
  [components.ner.model]
69
  [components.ner.model.tok2vec.embed]
70
  @architectures = "spacy.MultiHashEmbed.v2"
71
  width = 96
72
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
73
+ rows = [5000,2500,2500,2500,100]
74
  include_static_vectors = true
75
 
76
  [components.ner.model.tok2vec.encode]
85
  learn_tokens = false
86
  min_action_freq = 30
87
  moves = null
88
+ scorer = {"@scorers":"spacy.parser_scorer.v1"}
89
  update_with_oracle_cut_size = 100
90
 
91
  [components.parser.model]
104
 
105
  [components.senter]
106
  factory = "senter"
107
+ overwrite = false
108
+ scorer = {"@scorers":"spacy.senter_scorer.v1"}
109
 
110
  [components.senter.model]
111
  @architectures = "spacy.Tagger.v1"
117
  [components.senter.model.tok2vec.embed]
118
  @architectures = "spacy.MultiHashEmbed.v2"
119
  width = 16
120
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
121
+ rows = [1000,500,500,500,50]
122
  include_static_vectors = true
123
 
124
  [components.senter.model.tok2vec.encode]
137
  [components.tok2vec.model.embed]
138
  @architectures = "spacy.MultiHashEmbed.v2"
139
  width = ${components.tok2vec.model.encode:width}
140
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
141
+ rows = [5000,2500,2500,2500,100]
142
  include_static_vectors = true
143
 
144
  [components.tok2vec.model.encode]
152
 
153
  [corpora.dev]
154
  @readers = "spacy.Corpus.v1"
155
+ path = ${paths.dev}
 
 
156
  gold_preproc = false
157
+ max_length = 0
158
+ limit = 0
159
  augmenter = null
160
 
161
  [corpora.train]
162
  @readers = "spacy.Corpus.v1"
163
+ path = ${paths.train}
 
164
  gold_preproc = false
165
+ max_length = 0
166
  limit = 0
167
+ augmenter = null
 
 
 
168
 
169
  [training]
170
  train_corpus = "corpora.train"
195
  t = 0.0
196
 
197
  [training.logger]
198
+ @loggers = "spacy.ConsoleLogger.v1"
199
+ progress_bar = false
 
200
 
201
  [training.optimizer]
202
  @optimizers = "Adam.v1"
219
  sents_p = null
220
  sents_r = null
221
  sents_f = 0.02
222
+ lemma_acc = 0.5
223
+ ents_f = 0.16
224
  ents_p = 0.0
225
  ents_r = 0.0
226
  ents_per_type = null
227
+ speed = 0.0
228
 
229
  [pretraining]
230
 
231
  [initialize]
232
+ vocab_data = null
233
  vectors = ${paths.vectors}
234
  init_tok2vec = ${paths.init_tok2vec}
235
  before_init = null
el_core_news_md-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:723fd52520965c134839e86bf51d65a9e7e975c0c6397bf147949f8e4b3e2d1d
3
- size 43375757
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b923dada813ea464f117d938de3027281a015e71d23eb08e1042de32cbb37ad
3
+ size 44151746
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"el",
3
  "name":"core_news_md",
4
- "version":"3.1.0",
5
  "description":"Greek pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-NC-SA 3.0",
10
- "spacy_version":">=3.1.0,<3.2.0",
11
- "spacy_git_version":"caba63b74",
12
  "vectors":{
13
  "width":300,
14
  "vectors":20000,
@@ -452,297 +452,133 @@
452
  ],
453
  "performance":{
454
  "token_acc":1.0,
455
- "tag_acc":0.9331952867,
456
- "pos_acc":0.9631711285,
457
- "morph_acc":0.907755263,
458
- "lemma_acc":0.5648079673,
459
- "dep_uas":0.8805625755,
460
- "dep_las":0.8433139215,
461
- "sents_p":0.935483871,
462
- "sents_r":0.935483871,
463
- "sents_f":0.935483871,
464
- "speed":2641.746560651,
465
  "morph_per_feat":{
466
  "Abbr":{
467
- "p":0.9166666667,
468
- "r":0.8279569892,
469
- "f":0.8700564972
470
  },
471
  "Case":{
472
- "p":0.9353283458,
473
- "r":0.9379793264,
474
- "f":0.9366519604
475
  },
476
  "Gender":{
477
- "p":0.9369908562,
478
- "r":0.9396465488,
479
- "f":0.9383168234
480
  },
481
  "Number":{
482
- "p":0.9769784173,
483
  "r":0.9800808314,
484
- "f":0.9785271653
485
  },
486
  "Aspect":{
487
- "p":0.9473684211,
488
- "r":0.9397590361,
489
- "f":0.9435483871
490
  },
491
  "Mood":{
492
- "p":0.9893048128,
493
  "r":0.9946236559,
494
- "f":0.9919571046
495
  },
496
  "Person":{
497
- "p":0.9765567766,
498
- "r":0.9765567766,
499
- "f":0.9765567766
500
  },
501
  "Tense":{
502
- "p":0.9726918075,
503
- "r":0.9752281617,
504
- "f":0.9739583333
505
  },
506
  "VerbForm":{
507
- "p":0.9827935223,
508
- "r":0.9748995984,
509
- "f":0.9788306452
510
  },
511
  "Voice":{
512
- "p":0.967611336,
513
- "r":0.9598393574,
514
- "f":0.9637096774
515
  },
516
  "Definite":{
517
- "p":0.9914627205,
518
- "r":0.995997713,
519
- "f":0.9937250428
520
  },
521
  "PronType":{
522
- "p":0.9867398262,
523
- "r":0.9880952381,
524
- "f":0.987417067
525
  },
526
  "Foreign":{
527
- "p":0.7555555556,
528
- "r":0.6335403727,
529
- "f":0.6891891892
530
  },
531
  "NumType":{
532
- "p":0.9583333333,
533
- "r":0.8975609756,
534
- "f":0.9269521411
535
  },
536
  "Poss":{
537
- "p":0.9032258065,
538
- "r":0.9438202247,
539
- "f":0.9230769231
540
  },
541
  "Degree":{
542
- "p":0.8275862069,
543
- "r":0.6315789474,
544
- "f":0.7164179104
545
  }
546
  },
 
 
 
 
 
547
  "dep_las_per_type":{
548
- "root":{
549
- "p":0.9032258065,
550
- "r":0.9032258065,
551
- "f":0.9032258065
552
- },
553
- "nmod":{
554
- "p":0.79778157,
555
- "r":0.8032646048,
556
- "f":0.8005136986
557
- },
558
- "vocative":{
559
- "p":0.8,
560
- "r":0.5714285714,
561
- "f":0.6666666667
562
- },
563
- "cc":{
564
- "p":0.834375,
565
- "r":0.8317757009,
566
- "f":0.8330733229
567
- },
568
- "conj":{
569
- "p":0.5302197802,
570
- "r":0.5302197802,
571
- "f":0.5302197802
572
- },
573
- "aux":{
574
- "p":0.9776119403,
575
- "r":0.9632352941,
576
- "f":0.9703703704
577
- },
578
- "advmod":{
579
- "p":0.7658402204,
580
- "r":0.7830985915,
581
- "f":0.7743732591
582
- },
583
- "ccomp":{
584
- "p":0.7916666667,
585
- "r":0.8260869565,
586
- "f":0.8085106383
587
- },
588
- "det":{
589
- "p":0.9586243955,
590
- "r":0.9695652174,
591
- "f":0.9640637665
592
- },
593
- "obj":{
594
- "p":0.8357348703,
595
- "r":0.8814589666,
596
- "f":0.8579881657
597
- },
598
- "flat":{
599
- "p":0.6862745098,
600
- "r":0.7070707071,
601
- "f":0.6965174129
602
- },
603
- "case":{
604
- "p":0.958554729,
605
- "r":0.9636752137,
606
- "f":0.9611081513
607
- },
608
- "amod":{
609
- "p":0.9072929543,
610
- "r":0.890776699,
611
- "f":0.8989589712
612
- },
613
- "obl":{
614
- "p":0.786377709,
615
- "r":0.8,
616
- "f":0.7931303669
617
- },
618
- "acl:relcl":{
619
- "p":0.7904191617,
620
- "r":0.7173913043,
621
- "f":0.7521367521
622
- },
623
- "mark":{
624
- "p":0.8888888889,
625
- "r":0.8951048951,
626
- "f":0.8919860627
627
- },
628
- "nsubj:pass":{
629
- "p":0.7677419355,
630
- "r":0.7212121212,
631
- "f":0.74375
632
- },
633
- "nsubj":{
634
- "p":0.7571428571,
635
- "r":0.7429906542,
636
- "f":0.75
637
- },
638
- "cop":{
639
- "p":0.7474747475,
640
- "r":0.7326732673,
641
- "f":0.74
642
- },
643
- "parataxis":{
644
- "p":0.2222222222,
645
- "r":0.2352941176,
646
- "f":0.2285714286
647
- },
648
- "nummod":{
649
- "p":0.9125,
650
- "r":0.8795180723,
651
- "f":0.8957055215
652
- },
653
- "advcl":{
654
- "p":0.4758064516,
655
- "r":0.5566037736,
656
- "f":0.5130434783
657
- },
658
- "xcomp":{
659
- "p":0.7333333333,
660
- "r":0.6547619048,
661
- "f":0.6918238994
662
- },
663
- "csubj":{
664
- "p":0.7142857143,
665
- "r":0.4545454545,
666
- "f":0.5555555556
667
- },
668
- "acl":{
669
- "p":0.6896551724,
670
- "r":0.4545454545,
671
- "f":0.5479452055
672
- },
673
- "compound":{
674
- "p":0.0,
675
- "r":0.0,
676
- "f":0.0
677
- },
678
- "appos":{
679
- "p":0.3111111111,
680
- "r":0.2857142857,
681
- "f":0.2978723404
682
- },
683
- "fixed":{
684
- "p":0.4285714286,
685
- "r":0.4285714286,
686
- "f":0.4285714286
687
- },
688
- "csubj:pass":{
689
- "p":0.8,
690
- "r":0.6666666667,
691
- "f":0.7272727273
692
- },
693
- "obl:agent":{
694
- "p":0.5294117647,
695
- "r":0.36,
696
- "f":0.4285714286
697
- },
698
- "dep":{
699
- "p":0.0,
700
- "r":0.0,
701
- "f":0.0
702
- },
703
- "orphan":{
704
- "p":0.0,
705
- "r":0.0,
706
- "f":0.0
707
- },
708
- "iobj":{
709
- "p":1.0,
710
- "r":1.0,
711
- "f":1.0
712
- },
713
- "expl":{
714
- "p":0.0,
715
- "r":0.0,
716
- "f":0.0
717
- }
718
  },
719
- "ents_p":0.7975206612,
720
- "ents_r":0.8109243697,
721
- "ents_f":0.8041666667,
 
 
722
  "ents_per_type":{
723
  "PERSON":{
724
- "p":0.8923076923,
725
- "r":0.90625,
726
- "f":0.8992248062
727
  },
728
  "GPE":{
729
- "p":0.8404255319,
730
  "r":0.908045977,
731
- "f":0.8729281768
732
  },
733
  "ORG":{
734
- "p":0.7042253521,
735
- "r":0.7042253521,
736
- "f":0.7042253521
737
  },
738
  "PRODUCT":{
739
- "p":0.75,
740
- "r":0.375,
741
- "f":0.5
742
  },
743
  "EVENT":{
744
- "p":0.5,
745
- "r":0.5,
746
  "f":0.5
747
  },
748
  "LOC":{
@@ -750,11 +586,12 @@
750
  "r":0.0,
751
  "f":0.0
752
  }
753
- }
 
754
  },
755
  "sources":[
756
  {
757
- "name":"UD Greek GDT v2.5",
758
  "url":"https://github.com/UniversalDependencies/UD_Greek-GDT",
759
  "license":"CC BY-NC-SA 3.0",
760
  "author":"Prokopidis, Prokopis"
1
  {
2
  "lang":"el",
3
  "name":"core_news_md",
4
+ "version":"3.2.0",
5
  "description":"Greek pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-NC-SA 3.0",
10
+ "spacy_version":">=3.2.0,<3.3.0",
11
+ "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":300,
14
  "vectors":20000,
452
  ],
453
  "performance":{
454
  "token_acc":1.0,
455
+ "token_p":0.9990295973,
456
+ "token_r":0.9995068547,
457
+ "token_f":0.9992604644,
458
+ "pos_acc":0.9609032194,
459
+ "morph_acc":0.9083468915,
460
+ "morph_micro_p":0.9607098022,
461
+ "morph_micro_r":0.9600583189,
462
+ "morph_micro_f":0.9603839501,
 
 
463
  "morph_per_feat":{
464
  "Abbr":{
465
+ "p":0.974025974,
466
+ "r":0.8064516129,
467
+ "f":0.8823529412
468
  },
469
  "Case":{
470
+ "p":0.9366899302,
471
+ "r":0.9398132711,
472
+ "f":0.9382490013
473
  },
474
  "Gender":{
475
+ "p":0.9376869392,
476
+ "r":0.9408136045,
477
+ "f":0.9392476698
478
  },
479
  "Number":{
480
+ "p":0.9772596431,
481
  "r":0.9800808314,
482
+ "f":0.9786682041
483
  },
484
  "Aspect":{
485
+ "p":0.9503546099,
486
+ "r":0.9417670683,
487
+ "f":0.9460413515
488
  },
489
  "Mood":{
490
+ "p":0.9946236559,
491
  "r":0.9946236559,
492
+ "f":0.9946236559
493
  },
494
  "Person":{
495
+ "p":0.9816176471,
496
+ "r":0.978021978,
497
+ "f":0.9798165138
498
  },
499
  "Tense":{
500
+ "p":0.9713541667,
501
+ "r":0.9726205997,
502
+ "f":0.9719869707
503
  },
504
  "VerbForm":{
505
+ "p":0.9868287741,
506
+ "r":0.9779116466,
507
+ "f":0.9823499748
508
  },
509
  "Voice":{
510
+ "p":0.9716312057,
511
+ "r":0.9628514056,
512
+ "f":0.9672213817
513
  },
514
  "Definite":{
515
+ "p":0.9909039227,
516
+ "r":0.9965694683,
517
+ "f":0.9937286203
518
  },
519
  "PronType":{
520
+ "p":0.9840109639,
521
+ "r":0.9862637363,
522
+ "f":0.9851360622
523
  },
524
  "Foreign":{
525
+ "p":0.7769230769,
526
+ "r":0.6273291925,
527
+ "f":0.6941580756
528
  },
529
  "NumType":{
530
+ "p":0.9739583333,
531
+ "r":0.912195122,
532
+ "f":0.9420654912
533
  },
534
  "Poss":{
535
+ "p":0.9139784946,
536
+ "r":0.9550561798,
537
+ "f":0.9340659341
538
  },
539
  "Degree":{
540
+ "p":0.7666666667,
541
+ "r":0.6052631579,
542
+ "f":0.6764705882
543
  }
544
  },
545
+ "sents_p":0.9268292683,
546
+ "sents_r":0.9429280397,
547
+ "sents_f":0.9348093481,
548
+ "dep_uas":0.8800087946,
549
+ "dep_las":0.8434013082,
550
  "dep_las_per_type":{
551
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
552
  },
553
+ "tag_acc":0.9309273776,
554
+ "lemma_acc":0.5646107578,
555
+ "ents_p":0.7754237288,
556
+ "ents_r":0.768907563,
557
+ "ents_f":0.7721518987,
558
  "ents_per_type":{
559
  "PERSON":{
560
+ "p":0.8888888889,
561
+ "r":0.875,
562
+ "f":0.8818897638
563
  },
564
  "GPE":{
565
+ "p":0.8229166667,
566
  "r":0.908045977,
567
+ "f":0.8633879781
568
  },
569
  "ORG":{
570
+ "p":0.6338028169,
571
+ "r":0.6338028169,
572
+ "f":0.6338028169
573
  },
574
  "PRODUCT":{
575
+ "p":0.25,
576
+ "r":0.125,
577
+ "f":0.1666666667
578
  },
579
  "EVENT":{
580
+ "p":1.0,
581
+ "r":0.3333333333,
582
  "f":0.5
583
  },
584
  "LOC":{
586
  "r":0.0,
587
  "f":0.0
588
  }
589
+ },
590
+ "speed":2386.8488947681
591
  },
592
  "sources":[
593
  {
594
+ "name":"UD Greek GDT v2.8",
595
  "url":"https://github.com/UniversalDependencies/UD_Greek-GDT",
596
  "license":"CC BY-NC-SA 3.0",
597
  "author":"Prokopidis, Prokopis"
morphologizer/cfg CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "labels_morph":{
3
  "Case=Nom|Definite=Def|Gender=Fem|Number=Sing|POS=DET|PronType=Art":"Case=Nom|Definite=Def|Gender=Fem|Number=Sing|PronType=Art",
4
  "Foreign=Yes|POS=X":"Foreign=Yes",
@@ -714,5 +715,6 @@
714
  "Case=Gen|Gender=Fem|NumType=Ord|Number=Plur|POS=NUM":93,
715
  "Case=Dat|Definite=Def|Gender=Fem|Number=Sing|POS=DET|PronType=Art":90,
716
  "Case=Gen|Degree=Cmp|Gender=Masc|Number=Sing|POS=ADJ":84
717
- }
 
718
  }
1
  {
2
+ "extend":false,
3
  "labels_morph":{
4
  "Case=Nom|Definite=Def|Gender=Fem|Number=Sing|POS=DET|PronType=Art":"Case=Nom|Definite=Def|Gender=Fem|Number=Sing|PronType=Art",
5
  "Foreign=Yes|POS=X":"Foreign=Yes",
715
  "Case=Gen|Gender=Fem|NumType=Ord|Number=Plur|POS=NUM":93,
716
  "Case=Dat|Definite=Def|Gender=Fem|Number=Sing|POS=DET|PronType=Art":90,
717
  "Case=Gen|Degree=Cmp|Gender=Masc|Number=Sing|POS=ADJ":84
718
+ },
719
+ "overwrite":true
720
  }
morphologizer/model CHANGED
Binary files a/morphologizer/model and b/morphologizer/model differ
ner/model CHANGED
Binary files a/ner/model and b/ner/model differ
parser/model CHANGED
Binary files a/parser/model and b/parser/model differ
senter/cfg CHANGED
@@ -1,3 +1,3 @@
1
  {
2
-
3
  }
1
  {
2
+ "overwrite":false
3
  }
senter/model CHANGED
Binary files a/senter/model and b/senter/model differ
tok2vec/model CHANGED
Binary files a/tok2vec/model and b/tok2vec/model differ
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a04cb52774aa8a466834f48e1a6c99394f3fdea9f11e1f8981882edbc044edc4
3
- size 25442553
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e43c51edcca429c5d1125f81e66c9902e9c0c7322b4c585b8f9c4f410202f99
3
+ size 30279107
vocab/vectors.cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ {
2
+ "mode":"default"
3
+ }