oroszgy commited on
Commit
8b8d58a
1 Parent(s): b66e7d5

Update spacy pipeline to 3.5.2

Browse files
README.md CHANGED
@@ -14,72 +14,72 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.9202821869
18
  - name: NER Recall
19
  type: recall
20
- value: 0.9173699015
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.9188237366
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9823402728
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
- value: 0.981670256
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
- value: 0.9739172051
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
- value: 0.9899531145
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
- value: 0.9080729291
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
- value: 0.8665901043
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
- value: 0.9833887043
73
  ---
74
  Hungarian transformer pipeline (XLM-RoBERTa) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner
75
 
76
  | Feature | Description |
77
  | --- | --- |
78
  | **Name** | `hu_core_news_trf_xl` |
79
- | **Version** | `3.5.1` |
80
  | **spaCy** | `>=3.5.0,<3.6.0` |
81
- | **Default Pipeline** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
82
- | **Components** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `lemma_smoother`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
83
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
84
  | **Sources** | [UD Hungarian Szeged](https://universaldependencies.org/treebanks/hu_szeged/index.html) (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA-SZTE Research Group on Artificial Intelligence))<br />[NYTK-NerKor Corpus](https://github.com/nytud/NYTK-NerKor) (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics))<br />[hunNERwiki](http://hlt.sztaki.hu/resources/hunnerwiki.html) (Eszter Simon, Dávid Márk Nemeskey (HLT Group, Budapest University of Technology and Economics))<br />[Szeged NER Corpus](https://rgai.inf.u-szeged.hu/node/130) (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA-SZTE Research Group on Artificial Intelligence))<br />[huBERT base model (cased)](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) (Dávid Márk Nemeskey (SZTAKI-HLT)) |
85
  | **License** | `cc-by-sa-4.0` |
@@ -108,20 +108,20 @@ Hungarian transformer pipeline (XLM-RoBERTa) for HuSpaCy. Components: transforme
108
  | `TOKEN_P` | 99.86 |
109
  | `TOKEN_R` | 99.93 |
110
  | `TOKEN_F` | 99.89 |
111
- | `SENTS_P` | 97.80 |
112
- | `SENTS_R` | 98.89 |
113
- | `SENTS_F` | 98.34 |
114
- | `TAG_ACC` | 98.23 |
115
- | `POS_ACC` | 98.17 |
116
- | `MORPH_ACC` | 97.39 |
117
- | `MORPH_MICRO_P` | 99.16 |
118
- | `MORPH_MICRO_R` | 98.68 |
119
- | `MORPH_MICRO_F` | 98.92 |
120
- | `LEMMA_ACC` | 99.00 |
121
- | `BOUND_DEP_LAS` | 86.78 |
122
- | `BOUND_DEP_UAS` | 90.95 |
123
- | `DEP_UAS` | 90.81 |
124
- | `DEP_LAS` | 86.66 |
125
- | `ENTS_P` | 92.03 |
126
- | `ENTS_R` | 91.74 |
127
- | `ENTS_F` | 91.88 |
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.9149982438
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.9159634318
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.9154805834
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.981431853
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.980474732
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
+ value: 0.9659264931
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.9894746914
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
+ value: 0.9112312772
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
+ value: 0.868695569
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
+ value: 0.9933184855
73
  ---
74
  Hungarian transformer pipeline (XLM-RoBERTa) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner
75
 
76
  | Feature | Description |
77
  | --- | --- |
78
  | **Name** | `hu_core_news_trf_xl` |
79
+ | **Version** | `3.5.2` |
80
  | **spaCy** | `>=3.5.0,<3.6.0` |
81
+ | **Default Pipeline** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
82
+ | **Components** | `transformer`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `trainable_lemmatizer`, `experimental_arc_predicter`, `experimental_arc_labeler`, `ner` |
83
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
84
  | **Sources** | [UD Hungarian Szeged](https://universaldependencies.org/treebanks/hu_szeged/index.html) (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA-SZTE Research Group on Artificial Intelligence))<br />[NYTK-NerKor Corpus](https://github.com/nytud/NYTK-NerKor) (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics))<br />[hunNERwiki](http://hlt.sztaki.hu/resources/hunnerwiki.html) (Eszter Simon, Dávid Márk Nemeskey (HLT Group, Budapest University of Technology and Economics))<br />[Szeged NER Corpus](https://rgai.inf.u-szeged.hu/node/130) (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA-SZTE Research Group on Artificial Intelligence))<br />[huBERT base model (cased)](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) (Dávid Márk Nemeskey (SZTAKI-HLT)) |
85
  | **License** | `cc-by-sa-4.0` |
108
  | `TOKEN_P` | 99.86 |
109
  | `TOKEN_R` | 99.93 |
110
  | `TOKEN_F` | 99.89 |
111
+ | `SENTS_P` | 99.33 |
112
+ | `SENTS_R` | 99.33 |
113
+ | `SENTS_F` | 99.33 |
114
+ | `TAG_ACC` | 98.14 |
115
+ | `POS_ACC` | 98.05 |
116
+ | `MORPH_ACC` | 96.59 |
117
+ | `MORPH_MICRO_P` | 98.78 |
118
+ | `MORPH_MICRO_R` | 98.36 |
119
+ | `MORPH_MICRO_F` | 98.57 |
120
+ | `LEMMA_ACC` | 98.95 |
121
+ | `BOUND_DEP_LAS` | 86.89 |
122
+ | `BOUND_DEP_UAS` | 91.16 |
123
+ | `DEP_UAS` | 91.12 |
124
+ | `DEP_LAS` | 86.87 |
125
+ | `ENTS_P` | 91.50 |
126
+ | `ENTS_R` | 91.60 |
127
+ | `ENTS_F` | 91.55 |
config.cfg CHANGED
@@ -1,8 +1,8 @@
1
  [paths]
2
- tagger_model = "models/hu_core_news_trf_xl-tagger-3.5.0/model-best"
3
- parser_model = "models/hu_core_news_trf_xl-parser-3.5.0/model-best"
4
- ner_model = "models/hu_core_news_trf_xl-ner-3.5.0/model-best"
5
- lemmatizer_lookups = "models/hu_core_news_trf_xl-lookup-lemmatizer-3.5.0"
6
  train = null
7
  dev = null
8
  vectors = null
@@ -14,7 +14,7 @@ gpu_allocator = null
14
 
15
  [nlp]
16
  lang = "hu"
17
- pipeline = ["transformer","senter","tagger","morphologizer","lookup_lemmatizer","trainable_lemmatizer","lemma_smoother","experimental_arc_predicter","experimental_arc_labeler","ner"]
18
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
19
  disabled = []
20
  before_creation = null
@@ -30,30 +30,17 @@ scorer = {"@scorers":"spacy-experimental.biaffine_parser_scorer.v1"}
30
 
31
  [components.experimental_arc_labeler.model]
32
  @architectures = "spacy-experimental.Bilinear.v1"
33
- hidden_width = 128
34
- mixed_precision = false
35
  nO = null
36
  dropout = 0.1
37
  grad_scaler = null
38
 
39
  [components.experimental_arc_labeler.model.tok2vec]
40
- @architectures = "spacy-transformers.Tok2VecTransformer.v3"
41
- name = "xlm-roberta-large"
42
- mixed_precision = false
43
- pooling = {"@layers":"reduce_mean.v1"}
44
  grad_factor = 1.0
45
-
46
- [components.experimental_arc_labeler.model.tok2vec.get_spans]
47
- @span_getters = "spacy-transformers.strided_spans.v1"
48
- window = 128
49
- stride = 96
50
-
51
- [components.experimental_arc_labeler.model.tok2vec.grad_scaler_config]
52
-
53
- [components.experimental_arc_labeler.model.tok2vec.tokenizer_config]
54
- use_fast = true
55
-
56
- [components.experimental_arc_labeler.model.tok2vec.transformer_config]
57
 
58
  [components.experimental_arc_predicter]
59
  factory = "experimental_arc_predicter"
@@ -61,33 +48,17 @@ scorer = {"@scorers":"spacy-experimental.biaffine_parser_scorer.v1"}
61
 
62
  [components.experimental_arc_predicter.model]
63
  @architectures = "spacy-experimental.PairwiseBilinear.v1"
64
- hidden_width = 256
65
  nO = 1
66
  mixed_precision = false
67
  dropout = 0.1
68
  grad_scaler = null
69
 
70
  [components.experimental_arc_predicter.model.tok2vec]
71
- @architectures = "spacy-transformers.Tok2VecTransformer.v3"
72
- name = "xlm-roberta-large"
73
- mixed_precision = false
74
- pooling = {"@layers":"reduce_mean.v1"}
75
  grad_factor = 1.0
76
-
77
- [components.experimental_arc_predicter.model.tok2vec.get_spans]
78
- @span_getters = "spacy-transformers.strided_spans.v1"
79
- window = 128
80
- stride = 96
81
-
82
- [components.experimental_arc_predicter.model.tok2vec.grad_scaler_config]
83
-
84
- [components.experimental_arc_predicter.model.tok2vec.tokenizer_config]
85
- use_fast = true
86
-
87
- [components.experimental_arc_predicter.model.tok2vec.transformer_config]
88
-
89
- [components.lemma_smoother]
90
- factory = "hu.lemma_smoother"
91
 
92
  [components.lookup_lemmatizer]
93
  factory = "hu.lookup_lemmatizer"
@@ -145,6 +116,7 @@ stride = 96
145
 
146
  [components.ner.model.tok2vec.tokenizer_config]
147
  use_fast = true
 
148
 
149
  [components.ner.model.tok2vec.transformer_config]
150
 
@@ -193,10 +165,24 @@ top_k = 3
193
  nO = null
194
 
195
  [components.trainable_lemmatizer.model.tok2vec]
196
- @architectures = "spacy-transformers.TransformerListener.v1"
197
- grad_factor = 1.0
198
- upstream = "transformer"
199
  pooling = {"@layers":"reduce_mean.v1"}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
 
201
  [components.transformer]
202
  factory = "transformer"
@@ -217,6 +203,7 @@ stride = 96
217
 
218
  [components.transformer.model.tokenizer_config]
219
  use_fast = true
 
220
 
221
  [components.transformer.model.transformer_config]
222
 
1
  [paths]
2
+ tagger_model = "models/hu_core_news_trf_xl-tagger-3.5.2/model-best"
3
+ parser_model = "models/hu_core_news_trf_xl-parser-3.5.2/model-best"
4
+ ner_model = "models/hu_core_news_trf_xl-ner-3.5.2/model-best"
5
+ lemmatizer_lookups = "models/hu_core_news_trf_xl-lookup-lemmatizer-3.5.2"
6
  train = null
7
  dev = null
8
  vectors = null
14
 
15
  [nlp]
16
  lang = "hu"
17
+ pipeline = ["transformer","senter","tagger","morphologizer","lookup_lemmatizer","trainable_lemmatizer","experimental_arc_predicter","experimental_arc_labeler","ner"]
18
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
19
  disabled = []
20
  before_creation = null
30
 
31
  [components.experimental_arc_labeler.model]
32
  @architectures = "spacy-experimental.Bilinear.v1"
33
+ hidden_width = 256
34
+ mixed_precision = true
35
  nO = null
36
  dropout = 0.1
37
  grad_scaler = null
38
 
39
  [components.experimental_arc_labeler.model.tok2vec]
40
+ @architectures = "spacy-transformers.TransformerListener.v1"
 
 
 
41
  grad_factor = 1.0
42
+ upstream = "transformer"
43
+ pooling = {"@layers":"reduce_mean.v1"}
 
 
 
 
 
 
 
 
 
 
44
 
45
  [components.experimental_arc_predicter]
46
  factory = "experimental_arc_predicter"
48
 
49
  [components.experimental_arc_predicter.model]
50
  @architectures = "spacy-experimental.PairwiseBilinear.v1"
51
+ hidden_width = 64
52
  nO = 1
53
  mixed_precision = false
54
  dropout = 0.1
55
  grad_scaler = null
56
 
57
  [components.experimental_arc_predicter.model.tok2vec]
58
+ @architectures = "spacy-transformers.TransformerListener.v1"
 
 
 
59
  grad_factor = 1.0
60
+ upstream = "transformer"
61
+ pooling = {"@layers":"reduce_mean.v1"}
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  [components.lookup_lemmatizer]
64
  factory = "hu.lookup_lemmatizer"
116
 
117
  [components.ner.model.tok2vec.tokenizer_config]
118
  use_fast = true
119
+ model_max_length = 512
120
 
121
  [components.ner.model.tok2vec.transformer_config]
122
 
165
  nO = null
166
 
167
  [components.trainable_lemmatizer.model.tok2vec]
168
+ @architectures = "spacy-transformers.Tok2VecTransformer.v3"
169
+ name = "xlm-roberta-large"
170
+ mixed_precision = false
171
  pooling = {"@layers":"reduce_mean.v1"}
172
+ grad_factor = 1.0
173
+
174
+ [components.trainable_lemmatizer.model.tok2vec.get_spans]
175
+ @span_getters = "spacy-transformers.strided_spans.v1"
176
+ window = 128
177
+ stride = 96
178
+
179
+ [components.trainable_lemmatizer.model.tok2vec.grad_scaler_config]
180
+
181
+ [components.trainable_lemmatizer.model.tok2vec.tokenizer_config]
182
+ use_fast = true
183
+ model_max_length = 512
184
+
185
+ [components.trainable_lemmatizer.model.tok2vec.transformer_config]
186
 
187
  [components.transformer]
188
  factory = "transformer"
203
 
204
  [components.transformer.model.tokenizer_config]
205
  use_fast = true
206
+ model_max_length = 512
207
 
208
  [components.transformer.model.transformer_config]
209
 
experimental_arc_labeler/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d415fbe4fdcaf6ddf0fa5099ddd349b4b7a22dd45256582537910a3d435260da
3
- size 2258260957
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bf1868928fadb86fd9e90fbd673123607dee68de5d4e02b850ff1174cb5a0f3
3
+ size 15471467
experimental_arc_predicter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5f20dbd745f69e3a6b5f5eb83e61ed1ee11e17b47cc156f8402433dde49b0bc
3
- size 2256232061
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f531352a043e651736606e9d3cc586f3d6b79e99eb93b3b3c7c462b4df38e6dd
3
+ size 544264
hu_core_news_trf_xl-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:15deb669afc3207337ab8984afbed83f3cccd63cee37d4ae046107850169cb63
3
- size 7379980360
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19947bd67ef18028f02151c47d1106343afad88fffb7ae322727443f1dd21025
3
+ size 5554448736
meta.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "lang":"hu",
3
  "name":"core_news_trf_xl",
4
- "version":"3.5.1",
5
  "description":"Hungarian transformer pipeline (XLM-RoBERTa) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner",
6
  "author":"SzegedAI, MILAB",
7
  "email":"gyorgy@orosz.link",
@@ -1187,9 +1187,6 @@
1187
  ],
1188
  "lookup_lemmatizer":[
1189
 
1190
- ],
1191
- "lemma_smoother":[
1192
-
1193
  ],
1194
  "experimental_arc_predicter":[
1195
 
@@ -1261,7 +1258,6 @@
1261
  "morphologizer",
1262
  "lookup_lemmatizer",
1263
  "trainable_lemmatizer",
1264
- "lemma_smoother",
1265
  "experimental_arc_predicter",
1266
  "experimental_arc_labeler",
1267
  "ner"
@@ -1273,7 +1269,6 @@
1273
  "morphologizer",
1274
  "lookup_lemmatizer",
1275
  "trainable_lemmatizer",
1276
- "lemma_smoother",
1277
  "experimental_arc_predicter",
1278
  "experimental_arc_labeler",
1279
  "ner"
@@ -1286,297 +1281,292 @@
1286
  "token_p":0.998565417,
1287
  "token_r":0.9993300153,
1288
  "token_f":0.9989475698,
1289
- "sents_p":0.9779735683,
1290
- "sents_r":0.9888641425,
1291
- "sents_f":0.9833887043,
1292
- "tag_acc":0.9823402728,
1293
- "pos_acc":0.981670256,
1294
- "morph_acc":0.9739172051,
1295
- "morph_micro_p":0.9915792201,
1296
- "morph_micro_r":0.9867640739,
1297
- "morph_micro_f":0.9891657871,
1298
  "morph_per_feat":{
1299
  "Definite":{
1300
- "p":0.9902052239,
1301
- "r":0.9906672888,
1302
- "f":0.9904362025
1303
  },
1304
  "PronType":{
1305
- "p":0.991708126,
1306
- "r":0.9900662252,
1307
- "f":0.9908864954
1308
  },
1309
  "Case":{
1310
- "p":0.995030809,
1311
- "r":0.9891325825,
1312
- "f":0.9920729291
1313
  },
1314
  "Degree":{
1315
- "p":0.9793103448,
1316
- "r":0.9450915141,
1317
- "f":0.9618966977
1318
  },
1319
  "Number":{
1320
- "p":0.996970207,
1321
- "r":0.9926261103,
1322
- "f":0.9947934162
1323
  },
1324
  "Mood":{
1325
- "p":0.9911209767,
1326
- "r":0.9900221729,
1327
- "f":0.9905712701
1328
  },
1329
  "Person":{
1330
- "p":0.9900990099,
1331
- "r":0.9868421053,
1332
- "f":0.9884678748
1333
  },
1334
  "Tense":{
1335
- "p":0.9988901221,
1336
- "r":0.9944751381,
1337
- "f":0.9966777409
1338
  },
1339
  "VerbForm":{
1340
- "p":0.9854014599,
1341
- "r":0.9743384122,
1342
- "f":0.9798387097
1343
  },
1344
  "Voice":{
1345
- "p":0.9866939611,
1346
  "r":0.9856850716,
1347
- "f":0.9861892583
1348
  },
1349
  "Number[psor]":{
1350
- "p":0.9943181818,
1351
- "r":0.9971509972,
1352
- "f":0.9957325747
1353
  },
1354
  "Person[psor]":{
1355
- "p":0.9928977273,
1356
- "r":0.997146933,
1357
- "f":0.9950177936
1358
  },
1359
  "NumType":{
1360
- "p":0.9776674938,
1361
- "r":0.9609756098,
1362
- "f":0.9692496925
1363
  },
1364
  "Reflex":{
1365
- "p":1.0,
1366
- "r":0.75,
1367
- "f":0.8571428571
1368
- },
1369
- "Reflexive":{
1370
- "p":0.0,
1371
- "r":0.0,
1372
- "f":0.0
1373
  },
1374
  "Aspect":{
1375
  "p":1.0,
1376
  "r":0.25,
1377
  "f":0.4
1378
  },
1379
- "NumType[sem]":{
1380
- "p":0.0,
1381
- "r":0.0,
1382
- "f":0.0
1383
  },
1384
  "Number[psed]":{
1385
  "p":1.0,
1386
- "r":0.5555555556,
1387
- "f":0.7142857143
1388
- },
1389
- "Poss":{
1390
- "p":1.0,
1391
- "r":1.0,
1392
- "f":1.0
1393
  }
1394
  },
1395
- "lemma_acc":0.9899531145,
1396
- "bound_dep_las":0.8677780972,
1397
- "bound_dep_uas":0.9094567404,
1398
- "dep_uas":0.9080729291,
1399
- "dep_las":0.8665901043,
1400
  "dep_las_per_type":{
1401
  "415":{
1402
- "p":0.9248826291,
1403
- "r":0.9410828025,
1404
- "f":0.9329123915
1405
  },
1406
  "7411097074813287689":{
1407
- "p":0.9019920319,
1408
- "r":0.9255928046,
1409
- "f":0.9136400323
1410
  },
1411
  "429":{
1412
- "p":0.9110764431,
1413
- "r":0.9125,
1414
- "f":0.9117876659
1415
  },
1416
  "15861261214731031920":{
1417
- "p":0.7163461538,
1418
- "r":0.7303921569,
1419
- "f":0.7233009709
1420
  },
1421
  "991268021520064439":{
1422
- "p":0.8756302521,
1423
- "r":0.8830508475,
1424
- "f":0.8793248945
1425
  },
1426
  "435":{
1427
- "p":0.9028156222,
1428
- "r":0.8946894689,
1429
- "f":0.8987341772
1430
  },
1431
  "434":{
1432
- "p":0.953539823,
1433
- "r":0.9685393258,
1434
- "f":0.9609810479
1435
  },
1436
  "8206900633647566924":{
1437
- "p":0.8923395445,
1438
- "r":0.9599109131,
1439
- "f":0.9248927039
1440
  },
1441
  "407":{
1442
- "p":0.8443496802,
1443
  "r":0.8336842105,
1444
- "f":0.8389830508
1445
  },
1446
  "410":{
1447
- "p":0.75,
1448
- "r":0.7625,
1449
- "f":0.7561983471
1450
  },
1451
  "445":{
1452
- "p":0.8609226594,
1453
- "r":0.8580121704,
1454
- "f":0.8594649509
1455
  },
1456
  "400":{
1457
- "p":0.8571428571,
1458
- "r":0.8842105263,
1459
- "f":0.8704663212
1460
  },
1461
  "17772752594865228322":{
1462
- "p":0.9669811321,
1463
  "r":0.9579439252,
1464
- "f":0.9624413146
1465
  },
1466
  "403":{
1467
- "p":0.6413043478,
1468
- "r":0.6276595745,
1469
- "f":0.6344086022
1470
  },
1471
  "399":{
1472
- "p":0.6132075472,
1473
- "r":0.6632653061,
1474
- "f":0.637254902
1475
  },
1476
  "3143985677199705895":{
1477
- "p":0.784,
1478
- "r":0.852173913,
1479
- "f":0.8166666667
1480
  },
1481
  "9241468201421778905":{
1482
- "p":0.5789473684,
1483
- "r":0.6666666667,
1484
- "f":0.6197183099
1485
  },
1486
  "423":{
1487
- "p":0.9496855346,
1488
- "r":0.9556962025,
1489
- "f":0.952681388
1490
  },
1491
  "13543738850102096385":{
1492
- "p":0.9541284404,
1493
- "r":0.9541284404,
1494
- "f":0.9541284404
1495
  },
1496
  "10901028881100056900":{
1497
- "p":0.7352941176,
1498
- "r":0.78125,
1499
- "f":0.7575757576
1500
  },
1501
  "411":{
1502
- "p":0.8648648649,
1503
- "r":0.7804878049,
1504
- "f":0.8205128205
1505
  },
1506
  "12549387360942434255":{
1507
- "p":0.5897435897,
1508
- "r":0.575,
1509
- "f":0.582278481
1510
  },
1511
  "303601073839818384":{
1512
  "p":0.5,
1513
- "r":0.125,
1514
- "f":0.2
1515
  },
1516
  "8884235091647096537":{
1517
- "p":0.0,
1518
- "r":0.0,
1519
- "f":0.0
1520
  },
1521
  "2249809950233855422":{
1522
- "p":0.7222222222,
1523
- "r":0.8125,
1524
- "f":0.7647058824
1525
  },
1526
  "422":{
1527
- "p":0.4,
1528
- "r":0.6666666667,
1529
- "f":0.5
 
 
 
 
 
1530
  },
1531
  "8110129090154140942":{
1532
- "p":0.9536082474,
1533
- "r":0.943877551,
1534
- "f":0.9487179487
1535
  },
1536
  "412":{
1537
- "p":0.6896551724,
1538
- "r":0.5405405405,
1539
- "f":0.6060606061
1540
  },
1541
  "436":{
1542
- "p":0.3928571429,
1543
- "r":0.1506849315,
1544
- "f":0.2178217822
1545
  },
1546
  "450":{
1547
- "p":0.9473684211,
1548
  "r":0.972972973,
1549
- "f":0.96
1550
  },
1551
  "12837356684637874264":{
1552
- "p":0.765625,
1553
- "r":0.5268817204,
1554
- "f":0.6242038217
1555
  },
1556
  "451":{
1557
- "p":0.5915492958,
1558
- "r":0.5833333333,
1559
- "f":0.5874125874
1560
  },
1561
  "7349492218059511525":{
1562
- "p":0.5714285714,
1563
- "r":0.4,
1564
- "f":0.4705882353
1565
  },
1566
  "426":{
1567
- "p":0.625,
1568
  "r":0.4545454545,
1569
- "f":0.5263157895
1570
  },
1571
  "405":{
1572
- "p":0.9166666667,
1573
- "r":0.9166666667,
1574
- "f":0.9166666667
1575
  },
1576
  "17865338459503383721":{
1577
  "p":1.0,
1578
- "r":0.3333333333,
1579
- "f":0.5
1580
  },
1581
  "17311980334327143026":{
1582
  "p":0.0,
@@ -1584,14 +1574,9 @@
1584
  "f":0.0
1585
  },
1586
  "7037928807040764755":{
1587
- "p":0.975,
1588
- "r":0.975,
1589
- "f":0.975
1590
- },
1591
- "408":{
1592
- "p":0.1428571429,
1593
- "r":0.0769230769,
1594
- "f":0.1
1595
  },
1596
  "11190527879068114961":{
1597
  "p":0.0,
@@ -1599,24 +1584,24 @@
1599
  "f":0.0
1600
  },
1601
  "3350290345017230236":{
1602
- "p":0.1666666667,
1603
- "r":0.0833333333,
1604
- "f":0.1111111111
1605
- },
1606
- "10069665988847657778":{
1607
  "p":0.0,
1608
  "r":0.0,
1609
  "f":0.0
1610
  },
1611
  "17473201795025412735":{
1612
- "p":1.0,
1613
  "r":0.1666666667,
1614
- "f":0.2857142857
 
 
 
 
 
1615
  },
1616
  "6522094215780122214":{
1617
- "p":0.8,
1618
  "r":1.0,
1619
- "f":0.8888888889
1620
  },
1621
  "203073658115086772":{
1622
  "p":0.0,
@@ -1624,32 +1609,32 @@
1624
  "f":0.0
1625
  }
1626
  },
1627
- "ents_p":0.9202821869,
1628
- "ents_r":0.9173699015,
1629
- "ents_f":0.9188237366,
1630
  "ents_per_type":{
1631
  "ORG":{
1632
- "p":0.9381395349,
1633
- "r":0.9350950394,
1634
- "f":0.9366148131
1635
  },
1636
  "PER":{
1637
- "p":0.9494888755,
1638
- "r":0.9432497013,
1639
- "f":0.9463590051
1640
  },
1641
  "LOC":{
1642
- "p":0.9285091543,
1643
- "r":0.9244791667,
1644
- "f":0.9264897782
1645
  },
1646
  "MISC":{
1647
- "p":0.7845070423,
1648
- "r":0.790070922,
1649
- "f":0.7872791519
1650
  }
1651
  },
1652
- "speed":1581.862775048
1653
  },
1654
  "sources":[
1655
  {
1
  {
2
  "lang":"hu",
3
  "name":"core_news_trf_xl",
4
+ "version":"3.5.2",
5
  "description":"Hungarian transformer pipeline (XLM-RoBERTa) for HuSpaCy. Components: transformer, senter, tagger, morphologizer, lemmatizer, parser, ner",
6
  "author":"SzegedAI, MILAB",
7
  "email":"gyorgy@orosz.link",
1187
  ],
1188
  "lookup_lemmatizer":[
1189
 
 
 
 
1190
  ],
1191
  "experimental_arc_predicter":[
1192
 
1258
  "morphologizer",
1259
  "lookup_lemmatizer",
1260
  "trainable_lemmatizer",
 
1261
  "experimental_arc_predicter",
1262
  "experimental_arc_labeler",
1263
  "ner"
1269
  "morphologizer",
1270
  "lookup_lemmatizer",
1271
  "trainable_lemmatizer",
 
1272
  "experimental_arc_predicter",
1273
  "experimental_arc_labeler",
1274
  "ner"
1281
  "token_p":0.998565417,
1282
  "token_r":0.9993300153,
1283
  "token_f":0.9989475698,
1284
+ "sents_p":0.9933184855,
1285
+ "sents_r":0.9933184855,
1286
+ "sents_f":0.9933184855,
1287
+ "tag_acc":0.981431853,
1288
+ "pos_acc":0.980474732,
1289
+ "morph_acc":0.9659264931,
1290
+ "morph_micro_p":0.9877869843,
1291
+ "morph_micro_r":0.9836269875,
1292
+ "morph_micro_f":0.9857025968,
1293
  "morph_per_feat":{
1294
  "Definite":{
1295
+ "p":0.9865491651,
1296
+ "r":0.9925338311,
1297
+ "f":0.9895324494
1298
  },
1299
  "PronType":{
1300
+ "p":0.9889258029,
1301
+ "r":0.9856512141,
1302
+ "f":0.9872857933
1303
  },
1304
  "Case":{
1305
+ "p":0.9928486293,
1306
+ "r":0.9875518672,
1307
+ "f":0.9901931649
1308
  },
1309
  "Degree":{
1310
+ "p":0.9625212947,
1311
+ "r":0.9400998336,
1312
+ "f":0.9511784512
1313
  },
1314
  "Number":{
1315
+ "p":0.99545684,
1316
+ "r":0.9914529915,
1317
+ "f":0.9934508816
1318
  },
1319
  "Mood":{
1320
+ "p":0.9834254144,
1321
+ "r":0.9866962306,
1322
+ "f":0.9850581074
1323
  },
1324
  "Person":{
1325
+ "p":0.9851239669,
1326
+ "r":0.9802631579,
1327
+ "f":0.9826875515
1328
  },
1329
  "Tense":{
1330
+ "p":0.9889624724,
1331
+ "r":0.9900552486,
1332
+ "f":0.9895085588
1333
  },
1334
  "VerbForm":{
1335
+ "p":0.9813614263,
1336
+ "r":0.9711307137,
1337
+ "f":0.9762192664
1338
  },
1339
  "Voice":{
1340
+ "p":0.9826707441,
1341
  "r":0.9856850716,
1342
+ "f":0.9841755998
1343
  },
1344
  "Number[psor]":{
1345
+ "p":0.9885057471,
1346
+ "r":0.9800569801,
1347
+ "f":0.9842632332
1348
  },
1349
  "Person[psor]":{
1350
+ "p":0.9899425287,
1351
+ "r":0.9828815977,
1352
+ "f":0.9863994273
1353
  },
1354
  "NumType":{
1355
+ "p":0.934939759,
1356
+ "r":0.9463414634,
1357
+ "f":0.9406060606
1358
  },
1359
  "Reflex":{
1360
+ "p":0.875,
1361
+ "r":0.875,
1362
+ "f":0.875
 
 
 
 
 
1363
  },
1364
  "Aspect":{
1365
  "p":1.0,
1366
  "r":0.25,
1367
  "f":0.4
1368
  },
1369
+ "Poss":{
1370
+ "p":0.75,
1371
+ "r":1.0,
1372
+ "f":0.8571428571
1373
  },
1374
  "Number[psed]":{
1375
  "p":1.0,
1376
+ "r":0.3333333333,
1377
+ "f":0.5
 
 
 
 
 
1378
  }
1379
  },
1380
+ "lemma_acc":0.9894746914,
1381
+ "bound_dep_las":0.8688869748,
1382
+ "bound_dep_uas":0.9115704852,
1383
+ "dep_uas":0.9112312772,
1384
+ "dep_las":0.868695569,
1385
  "dep_las_per_type":{
1386
  "415":{
1387
+ "p":0.9375494071,
1388
+ "r":0.9442675159,
1389
+ "f":0.9408964697
1390
  },
1391
  "7411097074813287689":{
1392
+ "p":0.9214113873,
1393
+ "r":0.9394930499,
1394
+ "f":0.9303643725
1395
  },
1396
  "429":{
1397
+ "p":0.9213836478,
1398
+ "r":0.915625,
1399
+ "f":0.9184952978
1400
  },
1401
  "15861261214731031920":{
1402
+ "p":0.7578125,
1403
+ "r":0.7132352941,
1404
+ "f":0.7348484848
1405
  },
1406
  "991268021520064439":{
1407
+ "p":0.8987993139,
1408
+ "r":0.8881355932,
1409
+ "f":0.8934356351
1410
  },
1411
  "435":{
1412
+ "p":0.8863636364,
1413
+ "r":0.9126912691,
1414
+ "f":0.8993348115
1415
  },
1416
  "434":{
1417
+ "p":0.9516483516,
1418
+ "r":0.9730337079,
1419
+ "f":0.9622222222
1420
  },
1421
  "8206900633647566924":{
1422
+ "p":0.853515625,
1423
+ "r":0.9732739421,
1424
+ "f":0.9094693028
1425
  },
1426
  "407":{
1427
+ "p":0.8267223382,
1428
  "r":0.8336842105,
1429
+ "f":0.8301886792
1430
  },
1431
  "410":{
1432
+ "p":0.7733050847,
1433
+ "r":0.7604166667,
1434
+ "f":0.7668067227
1435
  },
1436
  "445":{
1437
+ "p":0.8590694538,
1438
+ "r":0.861392833,
1439
+ "f":0.8602295746
1440
  },
1441
  "400":{
1442
+ "p":0.8453608247,
1443
+ "r":0.8631578947,
1444
+ "f":0.8541666667
1445
  },
1446
  "17772752594865228322":{
1447
+ "p":0.9534883721,
1448
  "r":0.9579439252,
1449
+ "f":0.9557109557
1450
  },
1451
  "403":{
1452
+ "p":0.5909090909,
1453
+ "r":0.5531914894,
1454
+ "f":0.5714285714
1455
  },
1456
  "399":{
1457
+ "p":0.5247524752,
1458
+ "r":0.5408163265,
1459
+ "f":0.5326633166
1460
  },
1461
  "3143985677199705895":{
1462
+ "p":0.7807692308,
1463
+ "r":0.8826086957,
1464
+ "f":0.8285714286
1465
  },
1466
  "9241468201421778905":{
1467
+ "p":0.4146341463,
1468
+ "r":0.5151515152,
1469
+ "f":0.4594594595
1470
  },
1471
  "423":{
1472
+ "p":0.949044586,
1473
+ "r":0.9430379747,
1474
+ "f":0.946031746
1475
  },
1476
  "13543738850102096385":{
1477
+ "p":0.9633027523,
1478
+ "r":0.9633027523,
1479
+ "f":0.9633027523
1480
  },
1481
  "10901028881100056900":{
1482
+ "p":0.8275862069,
1483
+ "r":0.75,
1484
+ "f":0.7868852459
1485
  },
1486
  "411":{
1487
+ "p":0.8461538462,
1488
+ "r":0.8048780488,
1489
+ "f":0.825
1490
  },
1491
  "12549387360942434255":{
1492
+ "p":0.4857142857,
1493
+ "r":0.425,
1494
+ "f":0.4533333333
1495
  },
1496
  "303601073839818384":{
1497
  "p":0.5,
1498
+ "r":0.375,
1499
+ "f":0.4285714286
1500
  },
1501
  "8884235091647096537":{
1502
+ "p":0.5,
1503
+ "r":0.1666666667,
1504
+ "f":0.25
1505
  },
1506
  "2249809950233855422":{
1507
+ "p":0.5357142857,
1508
+ "r":0.46875,
1509
+ "f":0.5
1510
  },
1511
  "422":{
1512
+ "p":0.4137931034,
1513
+ "r":0.8,
1514
+ "f":0.5454545455
1515
+ },
1516
+ "408":{
1517
+ "p":0.0,
1518
+ "r":0.0,
1519
+ "f":0.0
1520
  },
1521
  "8110129090154140942":{
1522
+ "p":0.9896907216,
1523
+ "r":0.9795918367,
1524
+ "f":0.9846153846
1525
  },
1526
  "412":{
1527
+ "p":0.5714285714,
1528
+ "r":0.4324324324,
1529
+ "f":0.4923076923
1530
  },
1531
  "436":{
1532
+ "p":0.3125,
1533
+ "r":0.0684931507,
1534
+ "f":0.1123595506
1535
  },
1536
  "450":{
1537
+ "p":0.9350649351,
1538
  "r":0.972972973,
1539
+ "f":0.9536423841
1540
  },
1541
  "12837356684637874264":{
1542
+ "p":0.7564102564,
1543
+ "r":0.6344086022,
1544
+ "f":0.6900584795
1545
  },
1546
  "451":{
1547
+ "p":0.578125,
1548
+ "r":0.5138888889,
1549
+ "f":0.5441176471
1550
  },
1551
  "7349492218059511525":{
1552
+ "p":0.8,
1553
+ "r":0.8,
1554
+ "f":0.8
1555
  },
1556
  "426":{
1557
+ "p":0.7142857143,
1558
  "r":0.4545454545,
1559
+ "f":0.5555555556
1560
  },
1561
  "405":{
1562
+ "p":0.9090909091,
1563
+ "r":0.8333333333,
1564
+ "f":0.8695652174
1565
  },
1566
  "17865338459503383721":{
1567
  "p":1.0,
1568
+ "r":0.1666666667,
1569
+ "f":0.2857142857
1570
  },
1571
  "17311980334327143026":{
1572
  "p":0.0,
1574
  "f":0.0
1575
  },
1576
  "7037928807040764755":{
1577
+ "p":1.0,
1578
+ "r":1.0,
1579
+ "f":1.0
 
 
 
 
 
1580
  },
1581
  "11190527879068114961":{
1582
  "p":0.0,
1584
  "f":0.0
1585
  },
1586
  "3350290345017230236":{
 
 
 
 
 
1587
  "p":0.0,
1588
  "r":0.0,
1589
  "f":0.0
1590
  },
1591
  "17473201795025412735":{
1592
+ "p":0.2,
1593
  "r":0.1666666667,
1594
+ "f":0.1818181818
1595
+ },
1596
+ "10069665988847657778":{
1597
+ "p":0.0,
1598
+ "r":0.0,
1599
+ "f":0.0
1600
  },
1601
  "6522094215780122214":{
1602
+ "p":1.0,
1603
  "r":1.0,
1604
+ "f":1.0
1605
  },
1606
  "203073658115086772":{
1607
  "p":0.0,
1609
  "f":0.0
1610
  }
1611
  },
1612
+ "ents_p":0.9149982438,
1613
+ "ents_r":0.9159634318,
1614
+ "ents_f":0.9154805834,
1615
  "ents_per_type":{
1616
  "ORG":{
1617
+ "p":0.9283402681,
1618
+ "r":0.9309225777,
1619
+ "f":0.9296296296
1620
  },
1621
  "PER":{
1622
+ "p":0.9412114014,
1623
+ "r":0.9468339307,
1624
+ "f":0.9440142942
1625
  },
1626
  "LOC":{
1627
+ "p":0.9278887924,
1628
+ "r":0.9270833333,
1629
+ "f":0.927485888
1630
  },
1631
  "MISC":{
1632
+ "p":0.7887931034,
1633
+ "r":0.7787234043,
1634
+ "f":0.7837259101
1635
  }
1636
  },
1637
+ "speed":2317.5573317177
1638
  },
1639
  "sources":[
1640
  {
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:abc78750b871c4dcba7fc69138f751747147f384ef1932c7903405d857cdbf1b
3
  size 4695153
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca1bd4d8c5e2185843bd553a592a8b6a1816862b6a13db0cf9515671bce3aa9f
3
  size 4695153
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:67ea241f24bface3c6bd3484e9a45e03825d3cdac9b56e3d7d4b4587241aba7a
3
- size 2254213831
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd4c145ddf367e2cf83cd534fef01c8b5da2c90c2f0c08a8e527ecb48970e95e
3
+ size 2262217095
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:36c2c139648af12aa8b8de12dbd710958e6ff770273491f1567217a3b933314d
3
  size 8840
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b06e9dfcc8aa4dff4b4b00bba830a8445b911edf8a163b37f586fe964c82cc41
3
  size 8840
tagger/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d44433d7e6042d0141a311830ee505fded269037c14e759091925bd7be1b8763
3
  size 70342
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ca5c947cb64d16c48c71878b60742843cdd6137dcbf4f2d8cb594afd9a0dc7f
3
  size 70342
trainable_lemmatizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6f143786db9d209fb1c906d068ffb49a0a5e851bca711e3c1d7cbd2524b62154
3
- size 16470353
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5d9f7bb1af0b65dd2976ca7eaadafb4deca10f54acb6099b2978321f5419ef1
3
+ size 2278339716
transformer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8a9424952a25dd8baa52bbd1931dd605a46b08b2c71dc092eda66bcb1ad5090f
3
- size 2253866087
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90d1425b78999a38e46c706823582badec4518aa2d9a271a240035605c9b8793
3
+ size 2261869351
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a2441f7028cc80abc31b949efc629df97d2ad46310bfbb7debd601f09c61857b
3
- size 6393242
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b83edc14b03d35ed1c7b3e3b4c4bbcc82ccac1a32a30f2b5302fb6706f57b2c4
3
+ size 6393481