adrianeboyd commited on
Commit
6f612fd
1 Parent(s): be92e91

Update spaCy pipeline

Browse files
README.md CHANGED
@@ -14,72 +14,72 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.8585657371
18
  - name: NER Recall
19
  type: recall
20
- value: 0.8979166667
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.8778004073
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9831444348
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
- value: 0.9831444348
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
- value: 0.9809164003
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
- value: 0.9515738499
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
- value: 0.8842458101
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
- value: 0.8566640599
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
- value: 0.9494232476
73
  ---
74
  ### Details: https://spacy.io/models/da#da_core_news_trf
75
 
76
- Danish transformer pipeline (vesteinn/DanskBERT). Components: transformer, morphologizer, parser, lemmatizer (trainable_lemmatizer), ner, attribute_ruler.
77
 
78
  | Feature | Description |
79
  | --- | --- |
80
  | **Name** | `da_core_news_trf` |
81
- | **Version** | `3.6.1` |
82
- | **spaCy** | `>=3.6.0,<3.7.0` |
83
  | **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
84
  | **Components** | `transformer`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
85
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
@@ -109,18 +109,18 @@ Danish transformer pipeline (vesteinn/DanskBERT). Components: transformer, morph
109
  | `TOKEN_P` | 99.78 |
110
  | `TOKEN_R` | 99.75 |
111
  | `TOKEN_F` | 99.76 |
112
- | `POS_ACC` | 98.31 |
113
- | `MORPH_ACC` | 98.09 |
114
- | `MORPH_MICRO_P` | 99.07 |
115
- | `MORPH_MICRO_R` | 98.72 |
116
- | `MORPH_MICRO_F` | 98.90 |
117
- | `SENTS_P` | 95.03 |
118
- | `SENTS_R` | 94.86 |
119
- | `SENTS_F` | 94.94 |
120
- | `DEP_UAS` | 88.42 |
121
- | `DEP_LAS` | 85.67 |
122
- | `LEMMA_ACC` | 95.16 |
123
- | `TAG_ACC` | 98.31 |
124
- | `ENTS_P` | 85.86 |
125
- | `ENTS_R` | 89.79 |
126
- | `ENTS_F` | 87.78 |
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.8866396761
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.9125
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.8993839836
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9870683392
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9870683392
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
+ value: 0.9845498135
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.9579661017
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
+ value: 0.897934115
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
+ value: 0.8740509156
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
+ value: 0.9302736099
73
  ---
74
  ### Details: https://spacy.io/models/da#da_core_news_trf
75
 
76
+ Danish transformer pipeline (Transformer(name='vesteinn/DanskBERT', piece_encoder='xlm-roberta-sentencepiece', stride=120, type='xlm-roberta', width=768, window=152, vocab_size=50005)). Components: transformer, morphologizer, parser, lemmatizer (trainable_lemmatizer), ner, attribute_ruler.
77
 
78
  | Feature | Description |
79
  | --- | --- |
80
  | **Name** | `da_core_news_trf` |
81
+ | **Version** | `3.7.2` |
82
+ | **spaCy** | `>=3.7.0,<3.8.0` |
83
  | **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
84
  | **Components** | `transformer`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
85
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
109
  | `TOKEN_P` | 99.78 |
110
  | `TOKEN_R` | 99.75 |
111
  | `TOKEN_F` | 99.76 |
112
+ | `POS_ACC` | 98.71 |
113
+ | `MORPH_ACC` | 98.45 |
114
+ | `MORPH_MICRO_P` | 99.30 |
115
+ | `MORPH_MICRO_R` | 98.86 |
116
+ | `MORPH_MICRO_F` | 99.08 |
117
+ | `SENTS_P` | 92.62 |
118
+ | `SENTS_R` | 93.44 |
119
+ | `SENTS_F` | 93.03 |
120
+ | `DEP_UAS` | 89.79 |
121
+ | `DEP_LAS` | 87.41 |
122
+ | `LEMMA_ACC` | 95.80 |
123
+ | `TAG_ACC` | 98.71 |
124
+ | `ENTS_P` | 88.66 |
125
+ | `ENTS_R` | 91.25 |
126
+ | `ENTS_F` | 89.94 |
accuracy.json CHANGED
@@ -3,51 +3,51 @@
3
  "token_p": 0.9977732598,
4
  "token_r": 0.9974835463,
5
  "token_f": 0.997628382,
6
- "pos_acc": 0.9831444348,
7
- "morph_acc": 0.9809164003,
8
- "morph_micro_p": 0.9907294833,
9
- "morph_micro_r": 0.98717884,
10
- "morph_micro_f": 0.9889509747,
11
  "morph_per_feat": {
12
  "Mood": {
13
- "p": 0.9961795606,
14
  "r": 0.9942802669,
15
- "f": 0.9952290076
16
  },
17
  "Tense": {
18
- "p": 0.9872372372,
19
- "r": 0.9902108434,
20
- "f": 0.9887218045
21
  },
22
  "VerbForm": {
23
- "p": 0.9883792049,
24
- "r": 0.9889840881,
25
- "f": 0.988681554
26
  },
27
  "Voice": {
28
- "p": 0.996257485,
29
- "r": 0.9947683109,
30
- "f": 0.9955123411
31
  },
32
  "Definite": {
33
- "p": 0.9912490056,
34
- "r": 0.9845910707,
35
- "f": 0.9879088206
36
  },
37
  "Gender": {
38
- "p": 0.9859906604,
39
- "r": 0.9823861748,
40
- "f": 0.9841851174
41
  },
42
  "Number": {
43
- "p": 0.9895260539,
44
- "r": 0.9856546688,
45
- "f": 0.9875865674
46
  },
47
  "AdpType": {
48
- "p": 0.9982190561,
49
- "r": 0.991158267,
50
- "f": 0.9946761313
51
  },
52
  "PartType": {
53
  "p": 1.0,
@@ -55,29 +55,29 @@
55
  "f": 1.0
56
  },
57
  "Case": {
58
- "p": 0.9952305246,
59
  "r": 0.9889415482,
60
- "f": 0.9920760697
61
  },
62
  "Person": {
63
- "p": 0.9946808511,
64
- "r": 0.9964476021,
65
- "f": 0.9955634428
66
  },
67
  "PronType": {
68
- "p": 0.9942434211,
69
- "r": 0.9942434211,
70
- "f": 0.9942434211
71
  },
72
  "NumType": {
73
- "p": 0.972972973,
74
- "r": 0.9536423841,
75
- "f": 0.9632107023
76
  },
77
  "Degree": {
78
- "p": 0.9853836784,
79
- "r": 0.9746987952,
80
- "f": 0.9800121139
81
  },
82
  "Reflex": {
83
  "p": 1.0,
@@ -85,19 +85,19 @@
85
  "f": 1.0
86
  },
87
  "Number[psor]": {
88
- "p": 0.988372093,
89
- "r": 0.988372093,
90
- "f": 0.988372093
91
  },
92
  "Poss": {
93
- "p": 0.9886363636,
94
- "r": 0.9886363636,
95
- "f": 0.9886363636
96
  },
97
  "Foreign": {
98
- "p": 0.8571428571,
99
- "r": 0.6,
100
- "f": 0.7058823529
101
  },
102
  "Abbr": {
103
  "p": 1.0,
@@ -115,141 +115,146 @@
115
  "f": 1.0
116
  }
117
  },
118
- "sents_p": 0.9502664298,
119
- "sents_r": 0.9485815603,
120
- "sents_f": 0.9494232476,
121
- "dep_uas": 0.8842458101,
122
- "dep_las": 0.8566640599,
123
  "dep_las_per_type": {
124
  "advmod": {
125
- "p": 0.8081232493,
126
- "r": 0.8149717514,
127
- "f": 0.811533052
128
  },
129
  "root": {
130
- "p": 0.8989361702,
131
- "r": 0.8989361702,
132
- "f": 0.8989361702
133
  },
134
  "nsubj": {
135
- "p": 0.9171907757,
136
- "r": 0.9229957806,
137
- "f": 0.920084122
138
  },
139
  "case": {
140
- "p": 0.9323383085,
141
- "r": 0.9240631164,
142
- "f": 0.9281822684
143
  },
144
  "obl": {
145
- "p": 0.7925696594,
146
- "r": 0.7950310559,
147
- "f": 0.7937984496
148
  },
149
  "cc": {
150
- "p": 0.8746355685,
151
- "r": 0.8720930233,
152
- "f": 0.8733624454
153
  },
154
  "conj": {
155
- "p": 0.7639257294,
156
- "r": 0.768,
157
- "f": 0.7659574468
158
  },
159
  "obj": {
160
- "p": 0.9013282732,
161
- "r": 0.9223300971,
162
- "f": 0.9117082534
163
  },
164
  "aux": {
165
- "p": 0.9104046243,
166
- "r": 0.9183673469,
167
- "f": 0.9143686502
168
  },
169
  "acl:relcl": {
170
- "p": 0.7,
171
- "r": 0.6810810811,
172
- "f": 0.6904109589
173
  },
174
  "advmod:lmod": {
175
- "p": 0.8153846154,
176
- "r": 0.7910447761,
177
- "f": 0.803030303
178
  },
179
  "det": {
180
- "p": 0.9363784666,
181
  "r": 0.9456342669,
182
- "f": 0.9409836066
183
  },
184
  "amod": {
185
- "p": 0.8798646362,
186
- "r": 0.8873720137,
187
- "f": 0.8836023789
188
  },
189
  "nmod:poss": {
190
- "p": 0.7745098039,
191
- "r": 0.7821782178,
192
- "f": 0.7783251232
193
  },
194
  "ccomp": {
195
- "p": 0.7301587302,
196
- "r": 0.7419354839,
197
- "f": 0.736
198
  },
199
  "nummod": {
200
- "p": 0.8429752066,
201
- "r": 0.85,
202
- "f": 0.846473029
203
  },
204
  "flat": {
205
- "p": 0.8625,
206
- "r": 0.9139072848,
207
- "f": 0.8874598071
208
  },
209
  "compound:prt": {
210
- "p": 0.6764705882,
211
- "r": 0.5609756098,
212
- "f": 0.6133333333
213
  },
214
  "advcl": {
215
- "p": 0.7413793103,
216
- "r": 0.7413793103,
217
- "f": 0.7413793103
218
  },
219
  "mark": {
220
- "p": 0.9173553719,
221
- "r": 0.9117043121,
222
- "f": 0.9145211123
223
  },
224
  "cop": {
225
- "p": 0.901734104,
226
- "r": 0.8914285714,
227
- "f": 0.8965517241
228
  },
229
  "dep": {
230
- "p": 0.2307692308,
231
- "r": 0.3396226415,
232
- "f": 0.2748091603
233
  },
234
  "nmod": {
235
- "p": 0.7693920335,
236
- "r": 0.716796875,
237
- "f": 0.7421638018
238
  },
239
  "iobj": {
240
- "p": 0.9285714286,
241
- "r": 0.5909090909,
242
- "f": 0.7222222222
243
  },
244
  "xcomp": {
245
- "p": 0.6595744681,
246
- "r": 0.5254237288,
247
- "f": 0.5849056604
 
 
 
 
 
248
  },
249
  "list": {
250
- "p": 0.5,
251
- "r": 0.4444444444,
252
- "f": 0.4705882353
253
  },
254
  "vocative": {
255
  "p": 0.0,
@@ -257,62 +262,57 @@
257
  "f": 0.0
258
  },
259
  "fixed": {
260
- "p": 0.9210526316,
261
- "r": 0.8536585366,
262
- "f": 0.8860759494
263
  },
264
  "expl": {
265
- "p": 0.9393939394,
266
  "r": 0.9117647059,
267
- "f": 0.9253731343
268
  },
269
  "appos": {
270
- "p": 0.6315789474,
271
- "r": 0.7272727273,
272
- "f": 0.676056338
273
  },
274
  "obl:tmod": {
275
- "p": 0.7272727273,
276
- "r": 0.4444444444,
277
- "f": 0.5517241379
278
  },
279
  "discourse": {
280
  "p": 0.0,
281
  "r": 0.0,
282
  "f": 0.0
283
- },
284
- "obl:lmod": {
285
- "p": 0.0,
286
- "r": 0.0,
287
- "f": 0.0
288
  }
289
  },
290
- "lemma_acc": 0.9515738499,
291
- "tag_acc": 0.9831444348,
292
- "ents_p": 0.8585657371,
293
- "ents_r": 0.8979166667,
294
- "ents_f": 0.8778004073,
295
  "ents_per_type": {
 
 
 
 
 
296
  "PER": {
297
- "p": 0.9493670886,
298
- "r": 0.9036144578,
299
- "f": 0.9259259259
300
  },
301
  "ORG": {
302
- "p": 0.8720930233,
303
- "r": 0.8333333333,
304
- "f": 0.8522727273
305
- },
306
- "MISC": {
307
- "p": 0.7163120567,
308
- "r": 0.8938053097,
309
- "f": 0.7952755906
310
  },
311
  "LOC": {
312
- "p": 0.8974358974,
313
  "r": 0.9459459459,
314
- "f": 0.9210526316
315
  }
316
  },
317
- "speed": 655.5887888543
318
  }
3
  "token_p": 0.9977732598,
4
  "token_r": 0.9974835463,
5
  "token_f": 0.997628382,
6
+ "pos_acc": 0.9870683392,
7
+ "morph_acc": 0.9845498135,
8
+ "morph_micro_p": 0.9929531052,
9
+ "morph_micro_r": 0.9886426733,
10
+ "morph_micro_f": 0.9907932011,
11
  "morph_per_feat": {
12
  "Mood": {
13
+ "p": 0.9952290076,
14
  "r": 0.9942802669,
15
+ "f": 0.9947544111
16
  },
17
  "Tense": {
18
+ "p": 0.9924242424,
19
+ "r": 0.9864457831,
20
+ "f": 0.9894259819
21
  },
22
  "VerbForm": {
23
+ "p": 0.9919901417,
24
+ "r": 0.9853121175,
25
+ "f": 0.9886398526
26
  },
27
  "Voice": {
28
+ "p": 0.9947643979,
29
+ "r": 0.9940209268,
30
+ "f": 0.9943925234
31
  },
32
  "Definite": {
33
+ "p": 0.9948227798,
34
+ "r": 0.9869616752,
35
+ "f": 0.9908766363
36
  },
37
  "Gender": {
38
+ "p": 0.9886439546,
39
+ "r": 0.9837155201,
40
+ "f": 0.9861735799
41
  },
42
  "Number": {
43
+ "p": 0.9918763103,
44
+ "r": 0.987219614,
45
+ "f": 0.9895424837
46
  },
47
  "AdpType": {
48
+ "p": 0.9991134752,
49
+ "r": 0.9964633068,
50
+ "f": 0.9977866313
51
  },
52
  "PartType": {
53
  "p": 1.0,
55
  "f": 1.0
56
  },
57
  "Case": {
58
+ "p": 0.9968152866,
59
  "r": 0.9889415482,
60
+ "f": 0.9928628073
61
  },
62
  "Person": {
63
+ "p": 0.9982238011,
64
+ "r": 0.9982238011,
65
+ "f": 0.9982238011
66
  },
67
  "PronType": {
68
+ "p": 0.9975308642,
69
+ "r": 0.9967105263,
70
+ "f": 0.9971205265
71
  },
72
  "NumType": {
73
+ "p": 0.9867549669,
74
+ "r": 0.9867549669,
75
+ "f": 0.9867549669
76
  },
77
  "Degree": {
78
+ "p": 0.9795918367,
79
+ "r": 0.9831325301,
80
+ "f": 0.9813589898
81
  },
82
  "Reflex": {
83
  "p": 1.0,
85
  "f": 1.0
86
  },
87
  "Number[psor]": {
88
+ "p": 1.0,
89
+ "r": 1.0,
90
+ "f": 1.0
91
  },
92
  "Poss": {
93
+ "p": 1.0,
94
+ "r": 1.0,
95
+ "f": 1.0
96
  },
97
  "Foreign": {
98
+ "p": 1.0,
99
+ "r": 0.7,
100
+ "f": 0.8235294118
101
  },
102
  "Abbr": {
103
  "p": 1.0,
115
  "f": 1.0
116
  }
117
  },
118
+ "sents_p": 0.9261862917,
119
+ "sents_r": 0.9343971631,
120
+ "sents_f": 0.9302736099,
121
+ "dep_uas": 0.897934115,
122
+ "dep_las": 0.8740509156,
123
  "dep_las_per_type": {
124
  "advmod": {
125
+ "p": 0.8345323741,
126
+ "r": 0.8192090395,
127
+ "f": 0.8267997149
128
  },
129
  "root": {
130
+ "p": 0.9014084507,
131
+ "r": 0.9078014184,
132
+ "f": 0.9045936396
133
  },
134
  "nsubj": {
135
+ "p": 0.9345991561,
136
+ "r": 0.9345991561,
137
+ "f": 0.9345991561
138
  },
139
  "case": {
140
+ "p": 0.9358717435,
141
+ "r": 0.9211045365,
142
+ "f": 0.9284294235
143
  },
144
  "obl": {
145
+ "p": 0.8388625592,
146
+ "r": 0.8245341615,
147
+ "f": 0.8316366484
148
  },
149
  "cc": {
150
+ "p": 0.898255814,
151
+ "r": 0.898255814,
152
+ "f": 0.898255814
153
  },
154
  "conj": {
155
+ "p": 0.816,
156
+ "r": 0.816,
157
+ "f": 0.816
158
  },
159
  "obj": {
160
+ "p": 0.9128787879,
161
+ "r": 0.9359223301,
162
+ "f": 0.9242569511
163
  },
164
  "aux": {
165
+ "p": 0.9176470588,
166
+ "r": 0.9096209913,
167
+ "f": 0.9136163982
168
  },
169
  "acl:relcl": {
170
+ "p": 0.7932960894,
171
+ "r": 0.7675675676,
172
+ "f": 0.7802197802
173
  },
174
  "advmod:lmod": {
175
+ "p": 0.8805970149,
176
+ "r": 0.8805970149,
177
+ "f": 0.8805970149
178
  },
179
  "det": {
180
+ "p": 0.9456342669,
181
  "r": 0.9456342669,
182
+ "f": 0.9456342669
183
  },
184
  "amod": {
185
+ "p": 0.8939393939,
186
+ "r": 0.9061433447,
187
+ "f": 0.9
188
  },
189
  "nmod:poss": {
190
+ "p": 0.8,
191
+ "r": 0.7920792079,
192
+ "f": 0.7960199005
193
  },
194
  "ccomp": {
195
+ "p": 0.7105263158,
196
+ "r": 0.8709677419,
197
+ "f": 0.7826086957
198
  },
199
  "nummod": {
200
+ "p": 0.8174603175,
201
+ "r": 0.8583333333,
202
+ "f": 0.837398374
203
  },
204
  "flat": {
205
+ "p": 0.8805031447,
206
+ "r": 0.9271523179,
207
+ "f": 0.9032258065
208
  },
209
  "compound:prt": {
210
+ "p": 0.6666666667,
211
+ "r": 0.6341463415,
212
+ "f": 0.65
213
  },
214
  "advcl": {
215
+ "p": 0.7699115044,
216
+ "r": 0.75,
217
+ "f": 0.7598253275
218
  },
219
  "mark": {
220
+ "p": 0.9371069182,
221
+ "r": 0.9178644764,
222
+ "f": 0.9273858921
223
  },
224
  "cop": {
225
+ "p": 0.9152542373,
226
+ "r": 0.9257142857,
227
+ "f": 0.9204545455
228
  },
229
  "dep": {
230
+ "p": 0.2095238095,
231
+ "r": 0.4150943396,
232
+ "f": 0.2784810127
233
  },
234
  "nmod": {
235
+ "p": 0.784989858,
236
+ "r": 0.755859375,
237
+ "f": 0.7701492537
238
  },
239
  "iobj": {
240
+ "p": 0.9375,
241
+ "r": 0.6818181818,
242
+ "f": 0.7894736842
243
  },
244
  "xcomp": {
245
+ "p": 0.7346938776,
246
+ "r": 0.6101694915,
247
+ "f": 0.6666666667
248
+ },
249
+ "obl:lmod": {
250
+ "p": 0.0,
251
+ "r": 0.0,
252
+ "f": 0.0
253
  },
254
  "list": {
255
+ "p": 0.4,
256
+ "r": 0.3333333333,
257
+ "f": 0.3636363636
258
  },
259
  "vocative": {
260
  "p": 0.0,
262
  "f": 0.0
263
  },
264
  "fixed": {
265
+ "p": 0.9473684211,
266
+ "r": 0.8780487805,
267
+ "f": 0.9113924051
268
  },
269
  "expl": {
270
+ "p": 0.96875,
271
  "r": 0.9117647059,
272
+ "f": 0.9393939394
273
  },
274
  "appos": {
275
+ "p": 0.7297297297,
276
+ "r": 0.8181818182,
277
+ "f": 0.7714285714
278
  },
279
  "obl:tmod": {
280
+ "p": 1.0,
281
+ "r": 0.6111111111,
282
+ "f": 0.7586206897
283
  },
284
  "discourse": {
285
  "p": 0.0,
286
  "r": 0.0,
287
  "f": 0.0
 
 
 
 
 
288
  }
289
  },
290
+ "lemma_acc": 0.9579661017,
291
+ "tag_acc": 0.9870683392,
292
+ "ents_p": 0.8866396761,
293
+ "ents_r": 0.9125,
294
+ "ents_f": 0.8993839836,
295
  "ents_per_type": {
296
+ "MISC": {
297
+ "p": 0.7846153846,
298
+ "r": 0.9026548673,
299
+ "f": 0.8395061728
300
+ },
301
  "PER": {
302
+ "p": 0.950310559,
303
+ "r": 0.921686747,
304
+ "f": 0.9357798165
305
  },
306
  "ORG": {
307
+ "p": 0.8764044944,
308
+ "r": 0.8666666667,
309
+ "f": 0.8715083799
 
 
 
 
 
310
  },
311
  "LOC": {
312
+ "p": 0.9210526316,
313
  "r": 0.9459459459,
314
+ "f": 0.9333333333
315
  }
316
  },
317
+ "speed": 523.1005752745
318
  }
config.cfg CHANGED
@@ -17,6 +17,7 @@ after_creation = null
17
  after_pipeline_creation = null
18
  batch_size = 64
19
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
 
20
 
21
  [components]
22
 
@@ -39,10 +40,11 @@ nO = null
39
  normalize = false
40
 
41
  [components.lemmatizer.model.tok2vec]
42
- @architectures = "spacy-transformers.TransformerListener.v1"
43
- grad_factor = 1.0
44
  upstream = "transformer"
45
  pooling = {"@layers":"reduce_mean.v1"}
 
46
 
47
  [components.morphologizer]
48
  factory = "morphologizer"
@@ -57,10 +59,11 @@ nO = null
57
  normalize = false
58
 
59
  [components.morphologizer.model.tok2vec]
60
- @architectures = "spacy-transformers.TransformerListener.v1"
61
- grad_factor = 1.0
62
  upstream = "transformer"
63
  pooling = {"@layers":"reduce_mean.v1"}
 
64
 
65
  [components.ner]
66
  factory = "ner"
@@ -79,10 +82,11 @@ use_upper = false
79
  nO = null
80
 
81
  [components.ner.model.tok2vec]
82
- @architectures = "spacy-transformers.TransformerListener.v1"
83
- grad_factor = 1.0
84
  upstream = "transformer"
85
  pooling = {"@layers":"reduce_mean.v1"}
 
86
 
87
  [components.parser]
88
  factory = "parser"
@@ -102,32 +106,44 @@ use_upper = false
102
  nO = null
103
 
104
  [components.parser.model.tok2vec]
105
- @architectures = "spacy-transformers.TransformerListener.v1"
106
- grad_factor = 1.0
107
  upstream = "transformer"
108
  pooling = {"@layers":"reduce_mean.v1"}
 
109
 
110
  [components.transformer]
111
- factory = "transformer"
112
- max_batch_items = 4096
113
- set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
114
 
115
  [components.transformer.model]
116
- name = "vesteinn/DanskBERT"
117
- @architectures = "spacy-transformers.TransformerModel.v3"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  mixed_precision = false
119
-
120
- [components.transformer.model.get_spans]
121
- @span_getters = "spacy-transformers.strided_spans.v1"
122
- window = 128
123
- stride = 96
124
 
125
  [components.transformer.model.grad_scaler_config]
126
 
127
- [components.transformer.model.tokenizer_config]
128
- use_fast = true
129
-
130
- [components.transformer.model.transformer_config]
 
131
 
132
  [corpora]
133
 
@@ -164,11 +180,11 @@ annotating_components = []
164
  before_update = null
165
 
166
  [training.batcher]
167
- @batchers = "spacy.batch_by_padded.v1"
168
- discard_oversize = true
169
- get_length = null
170
  size = 2000
171
- buffer = 256
 
172
 
173
  [training.logger]
174
  @loggers = "spacy.ConsoleLogger.v1"
@@ -246,6 +262,18 @@ require = false
246
  path = "corpus/labels/parser.json"
247
  require = false
248
 
 
 
 
 
 
 
 
 
 
 
 
 
249
  [initialize.lookups]
250
  @misc = "spacy.LookupsDataLoader.v1"
251
  lang = ${nlp.lang}
17
  after_pipeline_creation = null
18
  batch_size = 64
19
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
20
+ vectors = {"@vectors":"spacy.Vectors.v1"}
21
 
22
  [components]
23
 
40
  normalize = false
41
 
42
  [components.lemmatizer.model.tok2vec]
43
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
44
+ width = ${components.transformer.model.hidden_width}
45
  upstream = "transformer"
46
  pooling = {"@layers":"reduce_mean.v1"}
47
+ grad_factor = 1.0
48
 
49
  [components.morphologizer]
50
  factory = "morphologizer"
59
  normalize = false
60
 
61
  [components.morphologizer.model.tok2vec]
62
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
63
+ width = ${components.transformer.model.hidden_width}
64
  upstream = "transformer"
65
  pooling = {"@layers":"reduce_mean.v1"}
66
+ grad_factor = 1.0
67
 
68
  [components.ner]
69
  factory = "ner"
82
  nO = null
83
 
84
  [components.ner.model.tok2vec]
85
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
86
+ width = ${components.transformer.model.hidden_width}
87
  upstream = "transformer"
88
  pooling = {"@layers":"reduce_mean.v1"}
89
+ grad_factor = 1.0
90
 
91
  [components.parser]
92
  factory = "parser"
106
  nO = null
107
 
108
  [components.parser.model.tok2vec]
109
+ @architectures = "spacy-curated-transformers.LastTransformerLayerListener.v1"
110
+ width = ${components.transformer.model.hidden_width}
111
  upstream = "transformer"
112
  pooling = {"@layers":"reduce_mean.v1"}
113
+ grad_factor = 1.0
114
 
115
  [components.transformer]
116
+ factory = "curated_transformer"
117
+ all_layer_outputs = false
118
+ frozen = false
119
 
120
  [components.transformer.model]
121
+ @architectures = "spacy-curated-transformers.XlmrTransformer.v1"
122
+ vocab_size = 50005
123
+ hidden_width = 768
124
+ piece_encoder = {"@architectures":"spacy-curated-transformers.XlmrSentencepieceEncoder.v1"}
125
+ attention_probs_dropout_prob = 0.1
126
+ hidden_act = "gelu"
127
+ hidden_dropout_prob = 0.1
128
+ intermediate_width = 3072
129
+ layer_norm_eps = 0.00001
130
+ max_position_embeddings = 514
131
+ model_max_length = 512
132
+ num_attention_heads = 12
133
+ num_hidden_layers = 12
134
+ padding_idx = 1
135
+ type_vocab_size = 1
136
+ torchscript = false
137
  mixed_precision = false
138
+ wrapped_listener = null
 
 
 
 
139
 
140
  [components.transformer.model.grad_scaler_config]
141
 
142
+ [components.transformer.model.with_spans]
143
+ @architectures = "spacy-curated-transformers.WithStridedSpans.v1"
144
+ stride = 120
145
+ window = 152
146
+ batch_size = 384
147
 
148
  [corpora]
149
 
180
  before_update = null
181
 
182
  [training.batcher]
183
+ @batchers = "spacy.batch_by_words.v1"
184
+ discard_oversize = false
 
185
  size = 2000
186
+ tolerance = 0.2
187
+ get_length = null
188
 
189
  [training.logger]
190
  @loggers = "spacy.ConsoleLogger.v1"
262
  path = "corpus/labels/parser.json"
263
  require = false
264
 
265
+ [initialize.components.transformer]
266
+
267
+ [initialize.components.transformer.encoder_loader]
268
+ @model_loaders = "spacy-curated-transformers.HFTransformerEncoderLoader.v1"
269
+ name = "vesteinn/DanskBERT"
270
+ revision = "main"
271
+
272
+ [initialize.components.transformer.piecer_loader]
273
+ @model_loaders = "spacy-curated-transformers.HFPieceEncoderLoader.v1"
274
+ name = "vesteinn/DanskBERT"
275
+ revision = "main"
276
+
277
  [initialize.lookups]
278
  @misc = "spacy.LookupsDataLoader.v1"
279
  lang = ${nlp.lang}
da_core_news_trf-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:46cbd7cbcfa6a575e98ffd709ff7479612d2881978bc276576553dc47fd2fe72
3
- size 444187820
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89bfa83b2c7a53376976a12ac112ac91b910990e341b7c46d7faaef5fcc620e2
3
+ size 440703099
lemmatizer/cfg CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "labels":[
3
- 1,
4
  2,
5
  4,
6
  6,
@@ -10,16 +10,15 @@
10
  14,
11
  16,
12
  18,
13
- 20,
14
- 24,
15
  28,
16
  30,
17
  32,
18
- 34,
19
- 36,
20
  39,
21
  41,
22
- 42,
23
  43,
24
  45,
25
  47,
@@ -27,10 +26,10 @@
27
  51,
28
  53,
29
  55,
30
- 57,
31
- 61,
32
- 65,
33
  67,
 
34
  71,
35
  73,
36
  75,
@@ -44,62 +43,60 @@
44
  91,
45
  93,
46
  95,
47
- 99,
48
  101,
49
- 102,
50
  104,
51
- 107,
 
52
  111,
53
- 113,
54
- 116,
55
- 118,
56
- 121,
57
- 124,
58
- 127,
59
- 128,
60
- 131,
61
  133,
62
- 134,
63
  136,
64
  138,
65
  140,
66
  142,
67
  144,
68
- 145,
69
- 147,
70
  148,
71
- 149,
72
- 153,
73
- 155,
 
74
  158,
 
75
  161,
76
  164,
77
- 166,
78
- 168,
79
  170,
80
  172,
81
  174,
82
- 175,
83
- 177,
84
- 179,
85
  182,
86
- 184,
87
- 186,
88
- 188,
89
  190,
90
  192,
91
  194,
92
  196,
93
- 199,
94
- 201,
95
- 203,
96
  204,
97
  207,
98
- 208,
99
  209,
100
  211,
101
  212,
102
- 213,
103
  215,
104
  217,
105
  219,
@@ -107,173 +104,176 @@
107
  222,
108
  224,
109
  226,
110
- 229,
 
111
  231,
112
- 232,
113
- 234,
114
  235,
115
- 237,
116
  238,
117
- 242,
 
 
 
 
118
  248,
 
119
  252,
120
- 254,
121
  256,
122
- 258,
123
- 260,
124
  262,
125
- 263,
126
- 264,
127
- 265,
128
  268,
129
  270,
130
- 271,
131
- 273,
132
- 275,
133
  277,
 
134
  279,
135
- 281,
 
136
  285,
137
- 286,
138
- 288,
139
- 290,
140
  292,
141
  294,
142
- 295,
143
- 297,
144
- 298,
145
- 299,
146
  301,
147
  303,
148
- 306,
 
149
  309,
150
- 310,
151
- 312,
 
152
  315,
153
- 316,
154
- 318,
155
  319,
156
- 321,
157
- 322,
158
- 324,
159
  326,
160
  327,
161
  329,
162
- 331,
163
  333,
164
  335,
165
- 337,
 
166
  339,
167
  340,
168
  342,
169
  343,
170
- 347,
171
- 349,
172
  350,
173
- 354,
174
- 356,
175
- 360,
 
176
  363,
177
- 364,
178
- 365,
179
  368,
180
- 369,
181
  371,
182
  372,
183
- 373,
 
184
  376,
185
- 377,
186
  380,
187
  383,
188
  386,
 
189
  390,
190
- 392,
191
- 394,
192
  395,
193
  396,
194
  397,
 
195
  399,
 
196
  401,
197
  403,
198
  405,
199
  406,
200
- 407,
201
  409,
202
- 411,
203
- 413,
 
204
  416,
205
  418,
206
  419,
207
  420,
208
- 421,
209
  422,
210
  424,
211
  426,
212
- 429,
213
  430,
214
- 432,
215
  434,
216
  435,
217
  437,
 
218
  439,
219
  441,
220
- 443,
221
- 445,
222
- 447,
223
- 449,
 
224
  451,
225
  453,
226
  454,
227
  456,
228
  457,
229
  459,
230
- 460,
231
- 462,
232
  464,
233
- 465,
234
- 469,
235
  470,
236
  471,
237
- 473,
238
  474,
239
- 475,
240
  477,
241
- 480,
242
  481,
243
- 484,
244
  485,
 
245
  488,
246
  489,
247
- 491,
248
  493,
249
- 494,
250
- 497,
251
  498,
252
  501,
253
  503,
 
 
254
  506,
 
255
  508,
256
- 509,
257
  510,
258
  511,
259
- 513,
260
  514,
261
- 515,
262
  517,
 
263
  519,
264
  521,
265
  522,
266
- 523,
267
  524,
268
- 525,
269
- 527,
270
  528,
271
  530,
272
- 531,
273
  532,
274
  533,
 
275
  535,
276
- 536,
277
  537,
278
  538,
279
  540,
@@ -284,95 +284,95 @@
284
  548,
285
  549,
286
  551,
287
- 552,
288
- 554,
289
- 558,
290
  560,
291
  561,
292
  562,
293
  563,
294
- 564,
295
  565,
296
  567,
 
297
  569,
298
  570,
299
  572,
300
  574,
301
  575,
302
- 576,
303
  577,
304
  579,
305
  580,
306
- 584,
307
- 586,
 
308
  589,
309
- 591,
310
- 594,
311
  596,
312
  598,
 
313
  602,
314
- 603,
315
- 605,
316
  606,
 
 
317
  610,
318
- 612,
319
- 614,
320
- 618,
321
- 620,
322
- 621,
323
- 622,
324
  623,
325
- 624,
326
  625,
327
  626,
 
328
  628,
 
329
  630,
330
- 631,
331
- 633,
332
  634,
 
333
  636,
334
- 637,
335
- 640,
 
336
  642,
337
- 646,
338
- 647,
 
339
  648,
340
  650,
341
- 653,
342
  655,
343
  656,
344
  658,
345
- 659,
346
- 660,
347
- 662,
348
  663,
349
  664,
350
  666,
351
- 668,
352
  669,
 
353
  671,
354
  673,
355
- 674,
356
- 676,
357
- 678,
358
  679,
359
- 681,
360
- 683,
361
  684,
362
  685,
363
  687,
364
- 688,
365
- 690,
366
  691,
367
- 693,
368
  694,
369
  695,
 
 
370
  698,
371
- 699,
372
- 700,
373
  701,
 
374
  703,
375
- 705,
376
  706,
377
  708,
378
  709,
@@ -385,72 +385,66 @@
385
  725,
386
  727,
387
  730,
 
388
  732,
389
- 734,
390
  735,
391
- 736,
392
  738,
 
393
  740,
394
  741,
395
  742,
396
- 743,
397
- 744,
398
  745,
399
  746,
400
- 747,
401
  748,
402
- 751,
403
  752,
404
- 754,
405
  755,
 
406
  759,
 
407
  762,
408
- 764,
409
  765,
410
- 766,
411
  767,
 
412
  769,
413
  770,
414
- 772,
415
- 774,
416
  775,
417
- 776,
418
  777,
419
  778,
420
- 782,
421
- 784,
422
- 785,
423
  786,
 
424
  789,
 
425
  792,
426
  793,
427
  795,
 
428
  797,
429
  798,
430
- 799,
431
  801,
432
  802,
433
  803,
434
- 804,
435
  805,
436
  807,
437
  808,
438
- 809,
439
  810,
440
  811,
441
  813,
442
  815,
443
- 817,
444
  818,
445
- 820,
446
- 822,
447
  823,
448
  825,
449
  827,
450
- 829,
451
- 831,
452
- 833,
453
- 835,
454
- 837
455
  ]
456
  }
1
  {
2
  "labels":[
3
+ 0,
4
  2,
5
  4,
6
  6,
10
  14,
11
  16,
12
  18,
13
+ 22,
14
+ 26,
15
  28,
16
  30,
17
  32,
18
+ 35,
19
+ 37,
20
  39,
21
  41,
 
22
  43,
23
  45,
24
  47,
26
  51,
27
  53,
28
  55,
29
+ 59,
30
+ 63,
 
31
  67,
32
+ 69,
33
  71,
34
  73,
35
  75,
43
  91,
44
  93,
45
  95,
46
+ 97,
47
  101,
48
+ 103,
49
  104,
50
+ 106,
51
+ 108,
52
  111,
53
+ 115,
54
+ 117,
55
+ 120,
56
+ 123,
57
+ 126,
58
+ 129,
59
+ 130,
 
60
  133,
61
+ 135,
62
  136,
63
  138,
64
  140,
65
  142,
66
  144,
67
+ 146,
 
68
  148,
69
+ 150,
70
+ 151,
71
+ 152,
72
+ 154,
73
  158,
74
+ 160,
75
  161,
76
  164,
77
+ 167,
 
78
  170,
79
  172,
80
  174,
81
+ 176,
82
+ 178,
83
+ 180,
84
  182,
85
+ 183,
86
+ 185,
87
+ 187,
88
  190,
89
  192,
90
  194,
91
  196,
92
+ 198,
93
+ 200,
94
+ 202,
95
  204,
96
  207,
 
97
  209,
98
  211,
99
  212,
 
100
  215,
101
  217,
102
  219,
104
  222,
105
  224,
106
  226,
107
+ 228,
108
+ 230,
109
  231,
110
+ 233,
 
111
  235,
 
112
  238,
113
+ 240,
114
+ 241,
115
+ 243,
116
+ 245,
117
+ 246,
118
  248,
119
+ 250,
120
  252,
 
121
  256,
 
 
122
  262,
123
+ 266,
 
 
124
  268,
125
  270,
126
+ 272,
127
+ 274,
128
+ 276,
129
  277,
130
+ 278,
131
  279,
132
+ 282,
133
+ 284,
134
  285,
135
+ 287,
136
+ 289,
137
+ 291,
138
  292,
139
  294,
140
+ 296,
141
+ 300,
 
 
142
  301,
143
  303,
144
+ 305,
145
+ 307,
146
  309,
147
+ 311,
148
+ 313,
149
+ 314,
150
  315,
151
+ 317,
 
152
  319,
153
+ 320,
154
+ 323,
 
155
  326,
156
  327,
157
  329,
158
+ 332,
159
  333,
160
  335,
161
+ 336,
162
+ 338,
163
  339,
164
  340,
165
  342,
166
  343,
167
+ 345,
168
+ 346,
169
  350,
170
+ 352,
171
+ 353,
172
+ 357,
173
+ 359,
174
  363,
175
+ 366,
176
+ 367,
177
  368,
 
178
  371,
179
  372,
180
+ 374,
181
+ 375,
182
  376,
183
+ 379,
184
  380,
185
  383,
186
  386,
187
+ 387,
188
  390,
189
+ 391,
 
190
  395,
191
  396,
192
  397,
193
+ 398,
194
  399,
195
+ 400,
196
  401,
197
  403,
198
  405,
199
  406,
200
+ 408,
201
  409,
202
+ 412,
203
+ 414,
204
+ 415,
205
  416,
206
  418,
207
  419,
208
  420,
 
209
  422,
210
  424,
211
  426,
212
+ 428,
213
  430,
214
+ 431,
215
  434,
216
  435,
217
  437,
218
+ 438,
219
  439,
220
  441,
221
+ 442,
222
+ 444,
223
+ 446,
224
+ 448,
225
+ 450,
226
  451,
227
  453,
228
  454,
229
  456,
230
  457,
231
  459,
232
+ 461,
233
+ 463,
234
  464,
235
+ 468,
 
236
  470,
237
  471,
238
+ 472,
239
  474,
 
240
  477,
241
+ 478,
242
  481,
243
+ 482,
244
  485,
245
+ 486,
246
  488,
247
  489,
248
+ 492,
249
  493,
250
+ 496,
 
251
  498,
252
  501,
253
  503,
254
+ 504,
255
+ 505,
256
  506,
257
+ 507,
258
  508,
 
259
  510,
260
  511,
261
+ 512,
262
  514,
263
+ 516,
264
  517,
265
+ 518,
266
  519,
267
  521,
268
  522,
 
269
  524,
270
+ 526,
 
271
  528,
272
  530,
 
273
  532,
274
  533,
275
+ 534,
276
  535,
 
277
  537,
278
  538,
279
  540,
284
  548,
285
  549,
286
  551,
287
+ 553,
288
+ 557,
289
+ 559,
290
  560,
291
  561,
292
  562,
293
  563,
 
294
  565,
295
  567,
296
+ 568,
297
  569,
298
  570,
299
  572,
300
  574,
301
  575,
 
302
  577,
303
  579,
304
  580,
305
+ 582,
306
+ 583,
307
+ 587,
308
  589,
309
+ 590,
310
+ 593,
311
  596,
312
  598,
313
+ 600,
314
  602,
 
 
315
  606,
316
+ 607,
317
+ 608,
318
  610,
319
+ 611,
320
+ 615,
321
+ 617,
322
+ 619,
 
 
323
  623,
 
324
  625,
325
  626,
326
+ 627,
327
  628,
328
+ 629,
329
  630,
330
+ 632,
 
331
  634,
332
+ 635,
333
  636,
334
+ 638,
335
+ 639,
336
+ 641,
337
  642,
338
+ 643,
339
+ 644,
340
+ 645,
341
  648,
342
  650,
343
+ 654,
344
  655,
345
  656,
346
  658,
347
+ 661,
 
 
348
  663,
349
  664,
350
  666,
351
+ 667,
352
  669,
353
+ 670,
354
  671,
355
  673,
356
+ 675,
357
+ 677,
 
358
  679,
359
+ 680,
360
+ 682,
361
  684,
362
  685,
363
  687,
364
+ 689,
 
365
  691,
366
+ 692,
367
  694,
368
  695,
369
+ 696,
370
+ 697,
371
  698,
 
 
372
  701,
373
+ 702,
374
  703,
375
+ 704,
376
  706,
377
  708,
378
  709,
385
  725,
386
  727,
387
  730,
388
+ 731,
389
  732,
390
+ 733,
391
  735,
392
+ 737,
393
  738,
394
+ 739,
395
  740,
396
  741,
397
  742,
 
 
398
  745,
399
  746,
 
400
  748,
 
401
  752,
 
402
  755,
403
+ 757,
404
  759,
405
+ 760,
406
  762,
407
+ 763,
408
  765,
 
409
  767,
410
+ 768,
411
  769,
412
  770,
413
+ 771,
 
414
  775,
 
415
  777,
416
  778,
417
+ 779,
418
+ 780,
419
+ 783,
420
  786,
421
+ 787,
422
  789,
423
+ 791,
424
  792,
425
  793,
426
  795,
427
+ 796,
428
  797,
429
  798,
430
+ 800,
431
  801,
432
  802,
433
  803,
 
434
  805,
435
  807,
436
  808,
 
437
  810,
438
  811,
439
  813,
440
  815,
441
+ 816,
442
  818,
443
+ 819,
444
+ 821,
445
  823,
446
  825,
447
  827,
448
+ 829
 
 
 
 
449
  ]
450
  }
lemmatizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:15284a67182922c447cf06f6058ce3fcaedc76b663173da4c7e726727647cea0
3
- size 1391005
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0804a6d8b32f684ed40c9822a1b0883abe89ac3ed7e479934b3dd1b26d56c4d2
3
+ size 1372633
lemmatizer/trees CHANGED
Binary files a/lemmatizer/trees and b/lemmatizer/trees differ
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"da",
3
  "name":"core_news_trf",
4
- "version":"3.6.1",
5
- "description":"Danish transformer pipeline (vesteinn/DanskBERT). Components: transformer, morphologizer, parser, lemmatizer (trainable_lemmatizer), ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.6.0,<3.7.0",
11
- "spacy_git_version":"c067b5264",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -246,51 +246,51 @@
246
  "token_p":0.9977732598,
247
  "token_r":0.9974835463,
248
  "token_f":0.997628382,
249
- "pos_acc":0.9831444348,
250
- "morph_acc":0.9809164003,
251
- "morph_micro_p":0.9907294833,
252
- "morph_micro_r":0.98717884,
253
- "morph_micro_f":0.9889509747,
254
  "morph_per_feat":{
255
  "Mood":{
256
- "p":0.9961795606,
257
  "r":0.9942802669,
258
- "f":0.9952290076
259
  },
260
  "Tense":{
261
- "p":0.9872372372,
262
- "r":0.9902108434,
263
- "f":0.9887218045
264
  },
265
  "VerbForm":{
266
- "p":0.9883792049,
267
- "r":0.9889840881,
268
- "f":0.988681554
269
  },
270
  "Voice":{
271
- "p":0.996257485,
272
- "r":0.9947683109,
273
- "f":0.9955123411
274
  },
275
  "Definite":{
276
- "p":0.9912490056,
277
- "r":0.9845910707,
278
- "f":0.9879088206
279
  },
280
  "Gender":{
281
- "p":0.9859906604,
282
- "r":0.9823861748,
283
- "f":0.9841851174
284
  },
285
  "Number":{
286
- "p":0.9895260539,
287
- "r":0.9856546688,
288
- "f":0.9875865674
289
  },
290
  "AdpType":{
291
- "p":0.9982190561,
292
- "r":0.991158267,
293
- "f":0.9946761313
294
  },
295
  "PartType":{
296
  "p":1.0,
@@ -298,29 +298,29 @@
298
  "f":1.0
299
  },
300
  "Case":{
301
- "p":0.9952305246,
302
  "r":0.9889415482,
303
- "f":0.9920760697
304
  },
305
  "Person":{
306
- "p":0.9946808511,
307
- "r":0.9964476021,
308
- "f":0.9955634428
309
  },
310
  "PronType":{
311
- "p":0.9942434211,
312
- "r":0.9942434211,
313
- "f":0.9942434211
314
  },
315
  "NumType":{
316
- "p":0.972972973,
317
- "r":0.9536423841,
318
- "f":0.9632107023
319
  },
320
  "Degree":{
321
- "p":0.9853836784,
322
- "r":0.9746987952,
323
- "f":0.9800121139
324
  },
325
  "Reflex":{
326
  "p":1.0,
@@ -328,19 +328,19 @@
328
  "f":1.0
329
  },
330
  "Number[psor]":{
331
- "p":0.988372093,
332
- "r":0.988372093,
333
- "f":0.988372093
334
  },
335
  "Poss":{
336
- "p":0.9886363636,
337
- "r":0.9886363636,
338
- "f":0.9886363636
339
  },
340
  "Foreign":{
341
- "p":0.8571428571,
342
- "r":0.6,
343
- "f":0.7058823529
344
  },
345
  "Abbr":{
346
  "p":1.0,
@@ -358,141 +358,146 @@
358
  "f":1.0
359
  }
360
  },
361
- "sents_p":0.9502664298,
362
- "sents_r":0.9485815603,
363
- "sents_f":0.9494232476,
364
- "dep_uas":0.8842458101,
365
- "dep_las":0.8566640599,
366
  "dep_las_per_type":{
367
  "advmod":{
368
- "p":0.8081232493,
369
- "r":0.8149717514,
370
- "f":0.811533052
371
  },
372
  "root":{
373
- "p":0.8989361702,
374
- "r":0.8989361702,
375
- "f":0.8989361702
376
  },
377
  "nsubj":{
378
- "p":0.9171907757,
379
- "r":0.9229957806,
380
- "f":0.920084122
381
  },
382
  "case":{
383
- "p":0.9323383085,
384
- "r":0.9240631164,
385
- "f":0.9281822684
386
  },
387
  "obl":{
388
- "p":0.7925696594,
389
- "r":0.7950310559,
390
- "f":0.7937984496
391
  },
392
  "cc":{
393
- "p":0.8746355685,
394
- "r":0.8720930233,
395
- "f":0.8733624454
396
  },
397
  "conj":{
398
- "p":0.7639257294,
399
- "r":0.768,
400
- "f":0.7659574468
401
  },
402
  "obj":{
403
- "p":0.9013282732,
404
- "r":0.9223300971,
405
- "f":0.9117082534
406
  },
407
  "aux":{
408
- "p":0.9104046243,
409
- "r":0.9183673469,
410
- "f":0.9143686502
411
  },
412
  "acl:relcl":{
413
- "p":0.7,
414
- "r":0.6810810811,
415
- "f":0.6904109589
416
  },
417
  "advmod:lmod":{
418
- "p":0.8153846154,
419
- "r":0.7910447761,
420
- "f":0.803030303
421
  },
422
  "det":{
423
- "p":0.9363784666,
424
  "r":0.9456342669,
425
- "f":0.9409836066
426
  },
427
  "amod":{
428
- "p":0.8798646362,
429
- "r":0.8873720137,
430
- "f":0.8836023789
431
  },
432
  "nmod:poss":{
433
- "p":0.7745098039,
434
- "r":0.7821782178,
435
- "f":0.7783251232
436
  },
437
  "ccomp":{
438
- "p":0.7301587302,
439
- "r":0.7419354839,
440
- "f":0.736
441
  },
442
  "nummod":{
443
- "p":0.8429752066,
444
- "r":0.85,
445
- "f":0.846473029
446
  },
447
  "flat":{
448
- "p":0.8625,
449
- "r":0.9139072848,
450
- "f":0.8874598071
451
  },
452
  "compound:prt":{
453
- "p":0.6764705882,
454
- "r":0.5609756098,
455
- "f":0.6133333333
456
  },
457
  "advcl":{
458
- "p":0.7413793103,
459
- "r":0.7413793103,
460
- "f":0.7413793103
461
  },
462
  "mark":{
463
- "p":0.9173553719,
464
- "r":0.9117043121,
465
- "f":0.9145211123
466
  },
467
  "cop":{
468
- "p":0.901734104,
469
- "r":0.8914285714,
470
- "f":0.8965517241
471
  },
472
  "dep":{
473
- "p":0.2307692308,
474
- "r":0.3396226415,
475
- "f":0.2748091603
476
  },
477
  "nmod":{
478
- "p":0.7693920335,
479
- "r":0.716796875,
480
- "f":0.7421638018
481
  },
482
  "iobj":{
483
- "p":0.9285714286,
484
- "r":0.5909090909,
485
- "f":0.7222222222
486
  },
487
  "xcomp":{
488
- "p":0.6595744681,
489
- "r":0.5254237288,
490
- "f":0.5849056604
 
 
 
 
 
491
  },
492
  "list":{
493
- "p":0.5,
494
- "r":0.4444444444,
495
- "f":0.4705882353
496
  },
497
  "vocative":{
498
  "p":0.0,
@@ -500,64 +505,59 @@
500
  "f":0.0
501
  },
502
  "fixed":{
503
- "p":0.9210526316,
504
- "r":0.8536585366,
505
- "f":0.8860759494
506
  },
507
  "expl":{
508
- "p":0.9393939394,
509
  "r":0.9117647059,
510
- "f":0.9253731343
511
  },
512
  "appos":{
513
- "p":0.6315789474,
514
- "r":0.7272727273,
515
- "f":0.676056338
516
  },
517
  "obl:tmod":{
518
- "p":0.7272727273,
519
- "r":0.4444444444,
520
- "f":0.5517241379
521
  },
522
  "discourse":{
523
  "p":0.0,
524
  "r":0.0,
525
  "f":0.0
526
- },
527
- "obl:lmod":{
528
- "p":0.0,
529
- "r":0.0,
530
- "f":0.0
531
  }
532
  },
533
- "lemma_acc":0.9515738499,
534
- "tag_acc":0.9831444348,
535
- "ents_p":0.8585657371,
536
- "ents_r":0.8979166667,
537
- "ents_f":0.8778004073,
538
  "ents_per_type":{
 
 
 
 
 
539
  "PER":{
540
- "p":0.9493670886,
541
- "r":0.9036144578,
542
- "f":0.9259259259
543
  },
544
  "ORG":{
545
- "p":0.8720930233,
546
- "r":0.8333333333,
547
- "f":0.8522727273
548
- },
549
- "MISC":{
550
- "p":0.7163120567,
551
- "r":0.8938053097,
552
- "f":0.7952755906
553
  },
554
  "LOC":{
555
- "p":0.8974358974,
556
  "r":0.9459459459,
557
- "f":0.9210526316
558
  }
559
  },
560
- "speed":655.5887888543
561
  },
562
  "sources":[
563
  {
@@ -580,6 +580,6 @@
580
  }
581
  ],
582
  "requirements":[
583
- "spacy-transformers>=1.2.2,<1.3.0"
584
  ]
585
  }
1
  {
2
  "lang":"da",
3
  "name":"core_news_trf",
4
+ "version":"3.7.2",
5
+ "description":"Danish transformer pipeline (Transformer(name='vesteinn/DanskBERT', piece_encoder='xlm-roberta-sentencepiece', stride=120, type='xlm-roberta', width=768, window=152, vocab_size=50005)). Components: transformer, morphologizer, parser, lemmatizer (trainable_lemmatizer), ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.7.0,<3.8.0",
11
+ "spacy_git_version":"6b4f77441",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
246
  "token_p":0.9977732598,
247
  "token_r":0.9974835463,
248
  "token_f":0.997628382,
249
+ "pos_acc":0.9870683392,
250
+ "morph_acc":0.9845498135,
251
+ "morph_micro_p":0.9929531052,
252
+ "morph_micro_r":0.9886426733,
253
+ "morph_micro_f":0.9907932011,
254
  "morph_per_feat":{
255
  "Mood":{
256
+ "p":0.9952290076,
257
  "r":0.9942802669,
258
+ "f":0.9947544111
259
  },
260
  "Tense":{
261
+ "p":0.9924242424,
262
+ "r":0.9864457831,
263
+ "f":0.9894259819
264
  },
265
  "VerbForm":{
266
+ "p":0.9919901417,
267
+ "r":0.9853121175,
268
+ "f":0.9886398526
269
  },
270
  "Voice":{
271
+ "p":0.9947643979,
272
+ "r":0.9940209268,
273
+ "f":0.9943925234
274
  },
275
  "Definite":{
276
+ "p":0.9948227798,
277
+ "r":0.9869616752,
278
+ "f":0.9908766363
279
  },
280
  "Gender":{
281
+ "p":0.9886439546,
282
+ "r":0.9837155201,
283
+ "f":0.9861735799
284
  },
285
  "Number":{
286
+ "p":0.9918763103,
287
+ "r":0.987219614,
288
+ "f":0.9895424837
289
  },
290
  "AdpType":{
291
+ "p":0.9991134752,
292
+ "r":0.9964633068,
293
+ "f":0.9977866313
294
  },
295
  "PartType":{
296
  "p":1.0,
298
  "f":1.0
299
  },
300
  "Case":{
301
+ "p":0.9968152866,
302
  "r":0.9889415482,
303
+ "f":0.9928628073
304
  },
305
  "Person":{
306
+ "p":0.9982238011,
307
+ "r":0.9982238011,
308
+ "f":0.9982238011
309
  },
310
  "PronType":{
311
+ "p":0.9975308642,
312
+ "r":0.9967105263,
313
+ "f":0.9971205265
314
  },
315
  "NumType":{
316
+ "p":0.9867549669,
317
+ "r":0.9867549669,
318
+ "f":0.9867549669
319
  },
320
  "Degree":{
321
+ "p":0.9795918367,
322
+ "r":0.9831325301,
323
+ "f":0.9813589898
324
  },
325
  "Reflex":{
326
  "p":1.0,
328
  "f":1.0
329
  },
330
  "Number[psor]":{
331
+ "p":1.0,
332
+ "r":1.0,
333
+ "f":1.0
334
  },
335
  "Poss":{
336
+ "p":1.0,
337
+ "r":1.0,
338
+ "f":1.0
339
  },
340
  "Foreign":{
341
+ "p":1.0,
342
+ "r":0.7,
343
+ "f":0.8235294118
344
  },
345
  "Abbr":{
346
  "p":1.0,
358
  "f":1.0
359
  }
360
  },
361
+ "sents_p":0.9261862917,
362
+ "sents_r":0.9343971631,
363
+ "sents_f":0.9302736099,
364
+ "dep_uas":0.897934115,
365
+ "dep_las":0.8740509156,
366
  "dep_las_per_type":{
367
  "advmod":{
368
+ "p":0.8345323741,
369
+ "r":0.8192090395,
370
+ "f":0.8267997149
371
  },
372
  "root":{
373
+ "p":0.9014084507,
374
+ "r":0.9078014184,
375
+ "f":0.9045936396
376
  },
377
  "nsubj":{
378
+ "p":0.9345991561,
379
+ "r":0.9345991561,
380
+ "f":0.9345991561
381
  },
382
  "case":{
383
+ "p":0.9358717435,
384
+ "r":0.9211045365,
385
+ "f":0.9284294235
386
  },
387
  "obl":{
388
+ "p":0.8388625592,
389
+ "r":0.8245341615,
390
+ "f":0.8316366484
391
  },
392
  "cc":{
393
+ "p":0.898255814,
394
+ "r":0.898255814,
395
+ "f":0.898255814
396
  },
397
  "conj":{
398
+ "p":0.816,
399
+ "r":0.816,
400
+ "f":0.816
401
  },
402
  "obj":{
403
+ "p":0.9128787879,
404
+ "r":0.9359223301,
405
+ "f":0.9242569511
406
  },
407
  "aux":{
408
+ "p":0.9176470588,
409
+ "r":0.9096209913,
410
+ "f":0.9136163982
411
  },
412
  "acl:relcl":{
413
+ "p":0.7932960894,
414
+ "r":0.7675675676,
415
+ "f":0.7802197802
416
  },
417
  "advmod:lmod":{
418
+ "p":0.8805970149,
419
+ "r":0.8805970149,
420
+ "f":0.8805970149
421
  },
422
  "det":{
423
+ "p":0.9456342669,
424
  "r":0.9456342669,
425
+ "f":0.9456342669
426
  },
427
  "amod":{
428
+ "p":0.8939393939,
429
+ "r":0.9061433447,
430
+ "f":0.9
431
  },
432
  "nmod:poss":{
433
+ "p":0.8,
434
+ "r":0.7920792079,
435
+ "f":0.7960199005
436
  },
437
  "ccomp":{
438
+ "p":0.7105263158,
439
+ "r":0.8709677419,
440
+ "f":0.7826086957
441
  },
442
  "nummod":{
443
+ "p":0.8174603175,
444
+ "r":0.8583333333,
445
+ "f":0.837398374
446
  },
447
  "flat":{
448
+ "p":0.8805031447,
449
+ "r":0.9271523179,
450
+ "f":0.9032258065
451
  },
452
  "compound:prt":{
453
+ "p":0.6666666667,
454
+ "r":0.6341463415,
455
+ "f":0.65
456
  },
457
  "advcl":{
458
+ "p":0.7699115044,
459
+ "r":0.75,
460
+ "f":0.7598253275
461
  },
462
  "mark":{
463
+ "p":0.9371069182,
464
+ "r":0.9178644764,
465
+ "f":0.9273858921
466
  },
467
  "cop":{
468
+ "p":0.9152542373,
469
+ "r":0.9257142857,
470
+ "f":0.9204545455
471
  },
472
  "dep":{
473
+ "p":0.2095238095,
474
+ "r":0.4150943396,
475
+ "f":0.2784810127
476
  },
477
  "nmod":{
478
+ "p":0.784989858,
479
+ "r":0.755859375,
480
+ "f":0.7701492537
481
  },
482
  "iobj":{
483
+ "p":0.9375,
484
+ "r":0.6818181818,
485
+ "f":0.7894736842
486
  },
487
  "xcomp":{
488
+ "p":0.7346938776,
489
+ "r":0.6101694915,
490
+ "f":0.6666666667
491
+ },
492
+ "obl:lmod":{
493
+ "p":0.0,
494
+ "r":0.0,
495
+ "f":0.0
496
  },
497
  "list":{
498
+ "p":0.4,
499
+ "r":0.3333333333,
500
+ "f":0.3636363636
501
  },
502
  "vocative":{
503
  "p":0.0,
505
  "f":0.0
506
  },
507
  "fixed":{
508
+ "p":0.9473684211,
509
+ "r":0.8780487805,
510
+ "f":0.9113924051
511
  },
512
  "expl":{
513
+ "p":0.96875,
514
  "r":0.9117647059,
515
+ "f":0.9393939394
516
  },
517
  "appos":{
518
+ "p":0.7297297297,
519
+ "r":0.8181818182,
520
+ "f":0.7714285714
521
  },
522
  "obl:tmod":{
523
+ "p":1.0,
524
+ "r":0.6111111111,
525
+ "f":0.7586206897
526
  },
527
  "discourse":{
528
  "p":0.0,
529
  "r":0.0,
530
  "f":0.0
 
 
 
 
 
531
  }
532
  },
533
+ "lemma_acc":0.9579661017,
534
+ "tag_acc":0.9870683392,
535
+ "ents_p":0.8866396761,
536
+ "ents_r":0.9125,
537
+ "ents_f":0.8993839836,
538
  "ents_per_type":{
539
+ "MISC":{
540
+ "p":0.7846153846,
541
+ "r":0.9026548673,
542
+ "f":0.8395061728
543
+ },
544
  "PER":{
545
+ "p":0.950310559,
546
+ "r":0.921686747,
547
+ "f":0.9357798165
548
  },
549
  "ORG":{
550
+ "p":0.8764044944,
551
+ "r":0.8666666667,
552
+ "f":0.8715083799
 
 
 
 
 
553
  },
554
  "LOC":{
555
+ "p":0.9210526316,
556
  "r":0.9459459459,
557
+ "f":0.9333333333
558
  }
559
  },
560
+ "speed":523.1005752745
561
  },
562
  "sources":[
563
  {
580
  }
581
  ],
582
  "requirements":[
583
+ "spacy-curated-transformers>=0.2.0,<0.3.0"
584
  ]
585
  }
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:574f46da8f1a3e229a7cc9e36b94f7553a485f7f2ff7c2ae299e650beeba5f65
3
- size 483580
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:861742d9415e0a80d100979596a4ad498b775e5a8641a16f53e8b7625833248f
3
+ size 483664
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:144da1fbd0e2108b31ba352085a97cb53d27216ca31e9108fae176886094c057
3
- size 225962
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c0844a418557b03f99de94bde947b4955db2fa867945f90f5b4d4a1a9b05272
3
+ size 226046
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eeb64c4f7210ff479c0dc4bb40e88cf49bf44f156de24544d0b6194894c421dc
3
- size 460325
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f22af1d9791b6a0f960a1e5ce922b195b012cbb336db44fc40551ad4fd55568
3
+ size 460409
transformer/cfg CHANGED
@@ -1,3 +1,3 @@
1
  {
2
- "max_batch_items":4096
3
  }
1
  {
2
+
3
  }
transformer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3fd9206ed262fb41dd70f4c1b6ee884c247f8026afffe7c7bda4bedf8d992dd
3
- size 502755332
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96cb3e3b97b02ef81d8042f34b0c160be89ecb7c236a7659c1faa35646facec0
3
+ size 496570427
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5018aa5b5adda3620458714f53e1e93eca15eb04c8c7c0ae534adeb7541bf917
3
- size 471010
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16ca086c283d7cbff4e96efdb151a1f82d10e82a4df47678979f4be6b20e44e1
3
+ size 469344