osanseviero HF staff commited on
Commit
ddcdcad
1 Parent(s): a9d6412

Update spaCy pipeline

Browse files
LICENSES_SOURCES CHANGED
@@ -1,4 +1,4 @@
1
- # UD Danish DDT v2.5
2
 
3
  * Author: Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara
4
  * URL: https://github.com/UniversalDependencies/UD_Danish-DDT
1
+ # UD Danish DDT v2.8
2
 
3
  * Author: Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara
4
  * URL: https://github.com/UniversalDependencies/UD_Danish-DDT
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - token-classification
5
  language:
6
  - da
7
- license: CC-BY-SA-4.0
8
  model-index:
9
  - name: da_core_news_sm
10
  results:
@@ -14,47 +14,47 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.7439824945
18
  - name: NER Recall
19
  type: recall
20
- value: 0.7083333333
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.7257203842
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
  - name: POS Accuracy
29
  type: accuracy
30
- value: 0.952251816
31
  - task:
32
  name: SENTER
33
  type: token-classification
34
  metrics:
35
  - name: SENTER Precision
36
  type: precision
37
- value: 0.8375
38
  - name: SENTER Recall
39
  type: recall
40
- value: 0.8315602837
41
  - name: SENTER F Score
42
  type: f_score
43
- value: 0.834519573
44
  - task:
45
  name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
  - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.7983240223
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
  - name: Labeled Dependencies Accuracy
56
  type: accuracy
57
- value: 0.7983240223
58
  ---
59
  ### Details: https://spacy.io/models/da#da_core_news_sm
60
 
@@ -63,12 +63,12 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `da_core_news_sm` |
66
- | **Version** | `3.1.0` |
67
- | **spaCy** | `>=3.1.0,<3.2.0` |
68
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
71
- | **Sources** | [UD Danish DDT v2.5](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Lemmatization Lists](https://github.com/michmech/lemmatization-lists/) (Michal Měchura) |
72
  | **License** | `CC BY-SA 4.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
@@ -76,12 +76,12 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
76
 
77
  <details>
78
 
79
- <summary>View label scheme (194 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
  | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
84
- | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:loc`, `obl:tmod`, `punct`, `xcomp` |
85
  | **`senter`** | `I`, `S` |
86
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
87
 
@@ -92,15 +92,21 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
92
  | Type | Score |
93
  | --- | --- |
94
  | `TOKEN_ACC` | 99.95 |
95
- | `TAG_ACC` | 95.23 |
96
- | `POS_ACC` | 95.23 |
97
- | `MORPH_ACC` | 93.85 |
 
 
 
 
 
 
 
 
 
 
 
98
  | `LEMMA_ACC` | 84.91 |
99
- | `DEP_UAS` | 79.83 |
100
- | `DEP_LAS` | 75.32 |
101
- | `ENTS_P` | 74.40 |
102
- | `ENTS_R` | 70.83 |
103
- | `ENTS_F` | 72.57 |
104
- | `SENTS_P` | 83.75 |
105
- | `SENTS_R` | 83.16 |
106
- | `SENTS_F` | 83.45 |
4
  - token-classification
5
  language:
6
  - da
7
+ license: cc-by-sa-4.0
8
  model-index:
9
  - name: da_core_news_sm
10
  results:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.7682119205
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.725
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.7459807074
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
  - name: POS Accuracy
29
  type: accuracy
30
+ value: 0.9507990315
31
  - task:
32
  name: SENTER
33
  type: token-classification
34
  metrics:
35
  - name: SENTER Precision
36
  type: precision
37
+ value: 0.9045045045
38
  - name: SENTER Recall
39
  type: recall
40
+ value: 0.890070922
41
  - name: SENTER F Score
42
  type: f_score
43
+ value: 0.8972296693
44
  - task:
45
  name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
  - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
+ value: 0.8070861741
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
  - name: Labeled Dependencies Accuracy
56
  type: accuracy
57
+ value: 0.8070861741
58
  ---
59
  ### Details: https://spacy.io/models/da#da_core_news_sm
60
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `da_core_news_sm` |
66
+ | **Version** | `3.2.0` |
67
+ | **spaCy** | `>=3.2.0,<3.3.0` |
68
  | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
  | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
71
+ | **Sources** | [UD Danish DDT v2.8](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Lemmatization Lists](https://github.com/michmech/lemmatization-lists/) (Michal Měchura) |
72
  | **License** | `CC BY-SA 4.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
76
 
77
  <details>
78
 
79
+ <summary>View label scheme (195 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
  | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
84
+ | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
85
  | **`senter`** | `I`, `S` |
86
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
87
 
92
  | Type | Score |
93
  | --- | --- |
94
  | `TOKEN_ACC` | 99.95 |
95
+ | `TOKEN_P` | 99.78 |
96
+ | `TOKEN_R` | 99.75 |
97
+ | `TOKEN_F` | 99.76 |
98
+ | `POS_ACC` | 95.08 |
99
+ | `MORPH_ACC` | 93.71 |
100
+ | `MORPH_MICRO_P` | 95.63 |
101
+ | `MORPH_MICRO_R` | 94.83 |
102
+ | `MORPH_MICRO_F` | 95.23 |
103
+ | `SENTS_P` | 90.45 |
104
+ | `SENTS_R` | 89.01 |
105
+ | `SENTS_F` | 89.72 |
106
+ | `DEP_UAS` | 80.71 |
107
+ | `DEP_LAS` | 76.41 |
108
+ | `TAG_ACC` | 95.08 |
109
  | `LEMMA_ACC` | 84.91 |
110
+ | `ENTS_P` | 76.82 |
111
+ | `ENTS_R` | 72.50 |
112
+ | `ENTS_F` | 74.60 |
 
 
 
 
 
accuracy.json CHANGED
@@ -1,58 +1,53 @@
1
  {
2
  "token_acc": 0.9994672349,
3
- "tag_acc": 0.952251816,
4
- "pos_acc": 0.952251816,
5
- "morph_acc": 0.9384987893,
6
- "lemma_acc": 0.8491041162,
7
- "dep_uas": 0.7983240223,
8
- "dep_las": 0.7531843575,
9
- "ents_p": 0.7439824945,
10
- "ents_r": 0.7083333333,
11
- "ents_f": 0.7257203842,
12
- "sents_p": 0.8375,
13
- "sents_r": 0.8315602837,
14
- "sents_f": 0.834519573,
15
- "speed": 11486.3761387023,
16
  "morph_per_feat": {
17
  "Mood": {
18
- "p": 0.9675881792,
19
- "r": 0.9675881792,
20
- "f": 0.9675881792
21
  },
22
  "Tense": {
23
- "p": 0.9540316503,
24
- "r": 0.953313253,
25
- "f": 0.9536723164
26
  },
27
  "VerbForm": {
28
- "p": 0.9462631254,
29
- "r": 0.9375764994,
30
- "f": 0.9418997848
31
  },
32
  "Voice": {
33
- "p": 0.9736445783,
34
- "r": 0.966367713,
35
- "f": 0.9699924981
36
  },
37
  "Definite": {
38
- "p": 0.9573954984,
39
- "r": 0.9411299881,
40
- "f": 0.9491930663
41
  },
42
  "Gender": {
43
- "p": 0.9379194631,
44
- "r": 0.9288800266,
45
- "f": 0.9333778594
46
  },
47
  "Number": {
48
- "p": 0.9533227848,
49
- "r": 0.9428794992,
50
- "f": 0.9480723839
51
  },
52
  "AdpType": {
53
  "p": 1.0,
54
- "r": 0.9902740937,
55
- "f": 0.995113283
56
  },
57
  "PartType": {
58
  "p": 1.0,
@@ -60,39 +55,44 @@
60
  "f": 0.9983739837
61
  },
62
  "Case": {
63
- "p": 0.9741935484,
64
- "r": 0.9541864139,
65
- "f": 0.9640861931
66
  },
67
  "Person": {
68
- "p": 0.9787610619,
69
- "r": 0.9822380107,
70
- "f": 0.9804964539
71
  },
72
  "PronType": {
73
- "p": 0.98195242,
74
- "r": 0.984375,
75
- "f": 0.9831622177
76
  },
77
  "NumType": {
78
- "p": 0.986013986,
79
- "r": 0.9337748344,
80
- "f": 0.9591836735
81
  },
82
  "Degree": {
83
- "p": 0.9312039312,
84
- "r": 0.913253012,
85
- "f": 0.9221411192
86
  },
87
  "Reflex": {
88
  "p": 1.0,
89
  "r": 1.0,
90
  "f": 1.0
91
  },
 
 
 
 
 
92
  "Number[psor]": {
93
- "p": 0.9772727273,
94
  "r": 1.0,
95
- "f": 0.9885057471
96
  },
97
  "Poss": {
98
  "p": 1.0,
@@ -100,156 +100,156 @@
100
  "f": 1.0
101
  },
102
  "Foreign": {
103
- "p": 1.0,
104
- "r": 0.4,
105
- "f": 0.5714285714
106
  },
107
  "Abbr": {
108
- "p": 0.0,
109
- "r": 0.0,
110
- "f": 0.0
111
  },
112
  "Style": {
113
  "p": 1.0,
114
  "r": 1.0,
115
  "f": 1.0
116
- },
117
- "Polite": {
118
- "p": 1.0,
119
- "r": 0.5,
120
- "f": 0.6666666667
121
  }
122
  },
 
 
 
 
 
123
  "dep_las_per_type": {
124
  "advmod": {
125
- "p": 0.6757865937,
126
- "r": 0.697740113,
127
- "f": 0.6865879083
128
  },
129
  "root": {
130
- "p": 0.7860962567,
131
- "r": 0.7819148936,
132
- "f": 0.784
133
  },
134
  "nsubj": {
135
- "p": 0.8165236052,
136
- "r": 0.802742616,
137
- "f": 0.8095744681
138
  },
139
  "case": {
140
- "p": 0.8872403561,
141
- "r": 0.8863636364,
142
- "f": 0.8868017795
143
  },
144
  "obl": {
145
- "p": 0.6839546191,
146
- "r": 0.6562986003,
147
- "f": 0.6698412698
148
  },
149
  "cc": {
150
- "p": 0.7485714286,
151
- "r": 0.761627907,
152
- "f": 0.7550432277
153
  },
154
  "conj": {
155
- "p": 0.5506493506,
156
- "r": 0.5653333333,
157
- "f": 0.5578947368
158
  },
159
  "obj": {
160
- "p": 0.7422303473,
161
- "r": 0.7883495146,
162
- "f": 0.7645951036
163
  },
164
  "aux": {
165
- "p": 0.8738738739,
166
- "r": 0.8483965015,
167
- "f": 0.8609467456
168
  },
169
  "acl:relcl": {
170
- "p": 0.6272189349,
171
- "r": 0.572972973,
172
- "f": 0.5988700565
173
  },
174
- "obl:loc": {
175
- "p": 0.6825396825,
176
- "r": 0.6142857143,
177
- "f": 0.6466165414
178
  },
179
  "det": {
180
- "p": 0.8899835796,
181
- "r": 0.8929159802,
182
- "f": 0.8914473684
183
  },
184
  "amod": {
185
- "p": 0.766721044,
186
- "r": 0.8020477816,
187
- "f": 0.7839866555
188
  },
189
  "nmod:poss": {
190
- "p": 0.7052631579,
191
- "r": 0.6633663366,
192
- "f": 0.6836734694
193
  },
194
  "ccomp": {
195
- "p": 0.5362318841,
196
- "r": 0.5967741935,
197
- "f": 0.5648854962
198
  },
199
  "nummod": {
200
- "p": 0.8196721311,
201
- "r": 0.8333333333,
202
- "f": 0.826446281
203
  },
204
  "flat": {
205
- "p": 0.7865853659,
206
- "r": 0.8543046358,
207
- "f": 0.819047619
208
  },
209
  "compound:prt": {
210
- "p": 0.56,
211
- "r": 0.3414634146,
212
- "f": 0.4242424242
213
  },
214
  "advcl": {
215
- "p": 0.5508474576,
216
- "r": 0.5603448276,
217
- "f": 0.5555555556
218
  },
219
  "mark": {
220
- "p": 0.8445378151,
221
- "r": 0.8254620123,
222
- "f": 0.8348909657
223
  },
224
  "cop": {
225
- "p": 0.7526315789,
226
- "r": 0.8171428571,
227
- "f": 0.7835616438
228
  },
229
  "dep": {
230
- "p": 0.1772151899,
231
  "r": 0.2641509434,
232
- "f": 0.2121212121
233
  },
234
  "nmod": {
235
- "p": 0.6222222222,
236
- "r": 0.6015625,
237
- "f": 0.6117179742
238
  },
239
  "iobj": {
240
- "p": 0.7058823529,
241
- "r": 0.5454545455,
242
- "f": 0.6153846154
243
  },
244
  "xcomp": {
245
- "p": 0.475,
246
- "r": 0.3220338983,
247
- "f": 0.3838383838
248
  },
249
  "list": {
250
- "p": 0.2941176471,
251
- "r": 0.2777777778,
252
- "f": 0.2857142857
253
  },
254
  "vocative": {
255
  "p": 0.0,
@@ -257,24 +257,29 @@
257
  "f": 0.0
258
  },
259
  "fixed": {
260
- "p": 0.9189189189,
261
- "r": 0.8095238095,
262
- "f": 0.8607594937
 
 
 
 
 
263
  },
264
  "expl": {
265
- "p": 0.8387096774,
266
- "r": 0.7647058824,
267
- "f": 0.8
268
  },
269
  "appos": {
270
- "p": 0.5,
271
- "r": 0.4242424242,
272
- "f": 0.4590163934
273
  },
274
  "obl:tmod": {
275
- "p": 0.625,
276
- "r": 0.2777777778,
277
- "f": 0.3846153846
278
  },
279
  "discourse": {
280
  "p": 0.0,
@@ -282,26 +287,32 @@
282
  "f": 0.0
283
  }
284
  },
 
 
 
 
 
285
  "ents_per_type": {
286
  "PER": {
287
- "p": 0.7716049383,
288
- "r": 0.7530120482,
289
- "f": 0.762195122
290
  },
291
  "ORG": {
292
- "p": 0.7105263158,
293
- "r": 0.6,
294
- "f": 0.6506024096
295
  },
296
  "MISC": {
297
- "p": 0.6517857143,
298
- "r": 0.6460176991,
299
- "f": 0.6488888889
300
  },
301
  "LOC": {
302
- "p": 0.8224299065,
303
- "r": 0.7927927928,
304
- "f": 0.8073394495
305
  }
306
- }
 
307
  }
1
  {
2
  "token_acc": 0.9994672349,
3
+ "token_p": 0.9977732598,
4
+ "token_r": 0.9974835463,
5
+ "token_f": 0.997628382,
6
+ "pos_acc": 0.9507990315,
7
+ "morph_acc": 0.9371428571,
8
+ "morph_micro_p": 0.9563225412,
9
+ "morph_micro_r": 0.9482610671,
10
+ "morph_micro_f": 0.9522747434,
 
 
 
 
 
11
  "morph_per_feat": {
12
  "Mood": {
13
+ "p": 0.9639468691,
14
+ "r": 0.9685414681,
15
+ "f": 0.9662387066
16
  },
17
  "Tense": {
18
+ "p": 0.9529147982,
19
+ "r": 0.9600903614,
20
+ "f": 0.9564891223
21
  },
22
  "VerbForm": {
23
+ "p": 0.9471419791,
24
+ "r": 0.9430844553,
25
+ "f": 0.9451088623
26
  },
27
  "Voice": {
28
+ "p": 0.968492123,
29
+ "r": 0.9648729447,
30
+ "f": 0.9666791464
31
  },
32
  "Definite": {
33
+ "p": 0.9468212715,
34
+ "r": 0.9355985776,
35
+ "f": 0.9411764706
36
  },
37
  "Gender": {
38
+ "p": 0.9299128102,
39
+ "r": 0.9215686275,
40
+ "f": 0.9257219162
41
  },
42
  "Number": {
43
+ "p": 0.9519535375,
44
+ "r": 0.9405320814,
45
+ "f": 0.9462083443
46
  },
47
  "AdpType": {
48
  "p": 1.0,
49
+ "r": 0.9840848806,
50
+ "f": 0.9919786096
51
  },
52
  "PartType": {
53
  "p": 1.0,
55
  "f": 0.9983739837
56
  },
57
  "Case": {
58
+ "p": 0.9727126806,
59
+ "r": 0.9573459716,
60
+ "f": 0.9649681529
61
  },
62
  "Person": {
63
+ "p": 0.9770723104,
64
+ "r": 0.9840142096,
65
+ "f": 0.9805309735
66
  },
67
  "PronType": {
68
+ "p": 0.9826875515,
69
+ "r": 0.9802631579,
70
+ "f": 0.9814738576
71
  },
72
  "NumType": {
73
+ "p": 0.9793103448,
74
+ "r": 0.940397351,
75
+ "f": 0.9594594595
76
  },
77
  "Degree": {
78
+ "p": 0.9402985075,
79
+ "r": 0.9108433735,
80
+ "f": 0.9253365973
81
  },
82
  "Reflex": {
83
  "p": 1.0,
84
  "r": 1.0,
85
  "f": 1.0
86
  },
87
+ "Polite": {
88
+ "p": 0.75,
89
+ "r": 0.75,
90
+ "f": 0.75
91
+ },
92
  "Number[psor]": {
93
+ "p": 0.9885057471,
94
  "r": 1.0,
95
+ "f": 0.9942196532
96
  },
97
  "Poss": {
98
  "p": 1.0,
100
  "f": 1.0
101
  },
102
  "Foreign": {
103
+ "p": 0.8333333333,
104
+ "r": 0.5,
105
+ "f": 0.625
106
  },
107
  "Abbr": {
108
+ "p": 1.0,
109
+ "r": 0.2,
110
+ "f": 0.3333333333
111
  },
112
  "Style": {
113
  "p": 1.0,
114
  "r": 1.0,
115
  "f": 1.0
 
 
 
 
 
116
  }
117
  },
118
+ "sents_p": 0.9045045045,
119
+ "sents_r": 0.890070922,
120
+ "sents_f": 0.8972296693,
121
+ "dep_uas": 0.8070861741,
122
+ "dep_las": 0.7640549905,
123
  "dep_las_per_type": {
124
  "advmod": {
125
+ "p": 0.6948682386,
126
+ "r": 0.7076271186,
127
+ "f": 0.7011896431
128
  },
129
  "root": {
130
+ "p": 0.8078994614,
131
+ "r": 0.7978723404,
132
+ "f": 0.8028545941
133
  },
134
  "nsubj": {
135
+ "p": 0.8367129136,
136
+ "r": 0.8270042194,
137
+ "f": 0.8318302387
138
  },
139
  "case": {
140
+ "p": 0.8841584158,
141
+ "r": 0.8806706114,
142
+ "f": 0.8824110672
143
  },
144
  "obl": {
145
+ "p": 0.6810207337,
146
+ "r": 0.6630434783,
147
+ "f": 0.6719118804
148
  },
149
  "cc": {
150
+ "p": 0.7620396601,
151
+ "r": 0.7819767442,
152
+ "f": 0.7718794835
153
  },
154
  "conj": {
155
+ "p": 0.6066481994,
156
+ "r": 0.584,
157
+ "f": 0.5951086957
158
  },
159
  "obj": {
160
+ "p": 0.7678244973,
161
+ "r": 0.8155339806,
162
+ "f": 0.790960452
163
  },
164
  "aux": {
165
+ "p": 0.875,
166
+ "r": 0.8571428571,
167
+ "f": 0.8659793814
168
  },
169
  "acl:relcl": {
170
+ "p": 0.5852272727,
171
+ "r": 0.5567567568,
172
+ "f": 0.5706371191
173
  },
174
+ "advmod:lmod": {
175
+ "p": 0.7042253521,
176
+ "r": 0.7462686567,
177
+ "f": 0.7246376812
178
  },
179
  "det": {
180
+ "p": 0.903814262,
181
+ "r": 0.8978583196,
182
+ "f": 0.9008264463
183
  },
184
  "amod": {
185
+ "p": 0.7807757167,
186
+ "r": 0.7901023891,
187
+ "f": 0.7854113656
188
  },
189
  "nmod:poss": {
190
+ "p": 0.7083333333,
191
+ "r": 0.6732673267,
192
+ "f": 0.6903553299
193
  },
194
  "ccomp": {
195
+ "p": 0.6212121212,
196
+ "r": 0.6612903226,
197
+ "f": 0.640625
198
  },
199
  "nummod": {
200
+ "p": 0.7868852459,
201
+ "r": 0.8,
202
+ "f": 0.7933884298
203
  },
204
  "flat": {
205
+ "p": 0.7804878049,
206
+ "r": 0.8476821192,
207
+ "f": 0.8126984127
208
  },
209
  "compound:prt": {
210
+ "p": 0.652173913,
211
+ "r": 0.3658536585,
212
+ "f": 0.46875
213
  },
214
  "advcl": {
215
+ "p": 0.6339285714,
216
+ "r": 0.6120689655,
217
+ "f": 0.6228070175
218
  },
219
  "mark": {
220
+ "p": 0.8773784355,
221
+ "r": 0.8521560575,
222
+ "f": 0.8645833333
223
  },
224
  "cop": {
225
+ "p": 0.7700534759,
226
+ "r": 0.8228571429,
227
+ "f": 0.7955801105
228
  },
229
  "dep": {
230
+ "p": 0.1647058824,
231
  "r": 0.2641509434,
232
+ "f": 0.2028985507
233
  },
234
  "nmod": {
235
+ "p": 0.6058252427,
236
+ "r": 0.609375,
237
+ "f": 0.6075949367
238
  },
239
  "iobj": {
240
+ "p": 0.8181818182,
241
+ "r": 0.4090909091,
242
+ "f": 0.5454545455
243
  },
244
  "xcomp": {
245
+ "p": 0.431372549,
246
+ "r": 0.3728813559,
247
+ "f": 0.4
248
  },
249
  "list": {
250
+ "p": 0.3,
251
+ "r": 0.1666666667,
252
+ "f": 0.2142857143
253
  },
254
  "vocative": {
255
  "p": 0.0,
257
  "f": 0.0
258
  },
259
  "fixed": {
260
+ "p": 0.85,
261
+ "r": 0.8292682927,
262
+ "f": 0.8395061728
263
+ },
264
+ "obl:lmod": {
265
+ "p": 0.0,
266
+ "r": 0.0,
267
+ "f": 0.0
268
  },
269
  "expl": {
270
+ "p": 0.7941176471,
271
+ "r": 0.7941176471,
272
+ "f": 0.7941176471
273
  },
274
  "appos": {
275
+ "p": 0.4333333333,
276
+ "r": 0.3939393939,
277
+ "f": 0.4126984127
278
  },
279
  "obl:tmod": {
280
+ "p": 0.8571428571,
281
+ "r": 0.3333333333,
282
+ "f": 0.48
283
  },
284
  "discourse": {
285
  "p": 0.0,
287
  "f": 0.0
288
  }
289
  },
290
+ "tag_acc": 0.9507990315,
291
+ "lemma_acc": 0.8491041162,
292
+ "ents_p": 0.7682119205,
293
+ "ents_r": 0.725,
294
+ "ents_f": 0.7459807074,
295
  "ents_per_type": {
296
  "PER": {
297
+ "p": 0.8089171975,
298
+ "r": 0.765060241,
299
+ "f": 0.786377709
300
  },
301
  "ORG": {
302
+ "p": 0.7215189873,
303
+ "r": 0.6333333333,
304
+ "f": 0.674556213
305
  },
306
  "MISC": {
307
+ "p": 0.6782608696,
308
+ "r": 0.6902654867,
309
+ "f": 0.6842105263
310
  },
311
  "LOC": {
312
+ "p": 0.8431372549,
313
+ "r": 0.7747747748,
314
+ "f": 0.8075117371
315
  }
316
+ },
317
+ "speed": 10057.2129225514
318
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
config.cfg CHANGED
@@ -1,10 +1,8 @@
1
  [paths]
2
- train = "corpus/da-core-news/train.spacy"
3
- dev = "corpus/da-core-news/dev.spacy"
4
  vectors = null
5
- raw = null
6
  init_tok2vec = null
7
- vocab_data = null
8
 
9
  [system]
10
  gpu_allocator = null
@@ -24,6 +22,7 @@ tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
24
 
25
  [components.attribute_ruler]
26
  factory = "attribute_ruler"
 
27
  validate = false
28
 
29
  [components.lemmatizer]
@@ -31,9 +30,13 @@ factory = "lemmatizer"
31
  mode = "lookup"
32
  model = null
33
  overwrite = false
 
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
 
 
 
37
 
38
  [components.morphologizer.model]
39
  @architectures = "spacy.Tagger.v1"
@@ -48,6 +51,7 @@ upstream = "tok2vec"
48
  factory = "ner"
49
  incorrect_spans_key = null
50
  moves = null
 
51
  update_with_oracle_cut_size = 100
52
 
53
  [components.ner.model]
@@ -65,8 +69,8 @@ nO = null
65
  [components.ner.model.tok2vec.embed]
66
  @architectures = "spacy.MultiHashEmbed.v2"
67
  width = 96
68
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
69
- rows = [5000,2500,2500,2500]
70
  include_static_vectors = false
71
 
72
  [components.ner.model.tok2vec.encode]
@@ -81,6 +85,7 @@ factory = "parser"
81
  learn_tokens = false
82
  min_action_freq = 30
83
  moves = null
 
84
  update_with_oracle_cut_size = 100
85
 
86
  [components.parser.model]
@@ -99,6 +104,8 @@ upstream = "tok2vec"
99
 
100
  [components.senter]
101
  factory = "senter"
 
 
102
 
103
  [components.senter.model]
104
  @architectures = "spacy.Tagger.v1"
@@ -110,8 +117,8 @@ nO = null
110
  [components.senter.model.tok2vec.embed]
111
  @architectures = "spacy.MultiHashEmbed.v2"
112
  width = 16
113
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
114
- rows = [1000,500,500,500]
115
  include_static_vectors = false
116
 
117
  [components.senter.model.tok2vec.encode]
@@ -130,8 +137,8 @@ factory = "tok2vec"
130
  [components.tok2vec.model.embed]
131
  @architectures = "spacy.MultiHashEmbed.v2"
132
  width = ${components.tok2vec.model.encode:width}
133
- attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
134
- rows = [5000,2500,2500,2500]
135
  include_static_vectors = false
136
 
137
  [components.tok2vec.model.encode]
@@ -145,22 +152,19 @@ maxout_pieces = 3
145
 
146
  [corpora.dev]
147
  @readers = "spacy.Corpus.v1"
148
- limit = 0
149
- max_length = 0
150
- path = ${paths:dev}
151
  gold_preproc = false
 
 
152
  augmenter = null
153
 
154
  [corpora.train]
155
  @readers = "spacy.Corpus.v1"
156
- path = ${paths:train}
157
- max_length = 5000
158
  gold_preproc = false
 
159
  limit = 0
160
-
161
- [corpora.train.augmenter]
162
- @augmenters = "spacy.lower_case.v1"
163
- level = 0.1
164
 
165
  [training]
166
  train_corpus = "corpora.train"
@@ -191,9 +195,8 @@ compound = 1.001
191
  t = 0.0
192
 
193
  [training.logger]
194
- @loggers = "spacy.WandbLogger.v1"
195
- project_name = "spacy-v3.0.0a2"
196
- remove_config_values = []
197
 
198
  [training.optimizer]
199
  @optimizers = "Adam.v1"
@@ -216,16 +219,17 @@ dep_las_per_type = null
216
  sents_p = null
217
  sents_r = null
218
  sents_f = 0.02
219
- lemma_acc = 0.33
220
- ents_f = 0.33
221
  ents_p = 0.0
222
  ents_r = 0.0
223
  ents_per_type = null
 
224
 
225
  [pretraining]
226
 
227
  [initialize]
228
- vocab_data = ${paths.vocab_data}
229
  vectors = ${paths.vectors}
230
  init_tok2vec = ${paths.init_tok2vec}
231
  before_init = null
1
  [paths]
2
+ train = null
3
+ dev = null
4
  vectors = null
 
5
  init_tok2vec = null
 
6
 
7
  [system]
8
  gpu_allocator = null
22
 
23
  [components.attribute_ruler]
24
  factory = "attribute_ruler"
25
+ scorer = {"@scorers":"spacy.attribute_ruler_scorer.v1"}
26
  validate = false
27
 
28
  [components.lemmatizer]
30
  mode = "lookup"
31
  model = null
32
  overwrite = false
33
+ scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
37
+ extend = false
38
+ overwrite = true
39
+ scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
40
 
41
  [components.morphologizer.model]
42
  @architectures = "spacy.Tagger.v1"
51
  factory = "ner"
52
  incorrect_spans_key = null
53
  moves = null
54
+ scorer = {"@scorers":"spacy.ner_scorer.v1"}
55
  update_with_oracle_cut_size = 100
56
 
57
  [components.ner.model]
69
  [components.ner.model.tok2vec.embed]
70
  @architectures = "spacy.MultiHashEmbed.v2"
71
  width = 96
72
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
73
+ rows = [5000,2500,2500,2500,100]
74
  include_static_vectors = false
75
 
76
  [components.ner.model.tok2vec.encode]
85
  learn_tokens = false
86
  min_action_freq = 30
87
  moves = null
88
+ scorer = {"@scorers":"spacy.parser_scorer.v1"}
89
  update_with_oracle_cut_size = 100
90
 
91
  [components.parser.model]
104
 
105
  [components.senter]
106
  factory = "senter"
107
+ overwrite = false
108
+ scorer = {"@scorers":"spacy.senter_scorer.v1"}
109
 
110
  [components.senter.model]
111
  @architectures = "spacy.Tagger.v1"
117
  [components.senter.model.tok2vec.embed]
118
  @architectures = "spacy.MultiHashEmbed.v2"
119
  width = 16
120
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
121
+ rows = [1000,500,500,500,50]
122
  include_static_vectors = false
123
 
124
  [components.senter.model.tok2vec.encode]
137
  [components.tok2vec.model.embed]
138
  @architectures = "spacy.MultiHashEmbed.v2"
139
  width = ${components.tok2vec.model.encode:width}
140
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
141
+ rows = [5000,2500,2500,2500,100]
142
  include_static_vectors = false
143
 
144
  [components.tok2vec.model.encode]
152
 
153
  [corpora.dev]
154
  @readers = "spacy.Corpus.v1"
155
+ path = ${paths.dev}
 
 
156
  gold_preproc = false
157
+ max_length = 0
158
+ limit = 0
159
  augmenter = null
160
 
161
  [corpora.train]
162
  @readers = "spacy.Corpus.v1"
163
+ path = ${paths.train}
 
164
  gold_preproc = false
165
+ max_length = 0
166
  limit = 0
167
+ augmenter = null
 
 
 
168
 
169
  [training]
170
  train_corpus = "corpora.train"
195
  t = 0.0
196
 
197
  [training.logger]
198
+ @loggers = "spacy.ConsoleLogger.v1"
199
+ progress_bar = false
 
200
 
201
  [training.optimizer]
202
  @optimizers = "Adam.v1"
219
  sents_p = null
220
  sents_r = null
221
  sents_f = 0.02
222
+ lemma_acc = 0.5
223
+ ents_f = 0.16
224
  ents_p = 0.0
225
  ents_r = 0.0
226
  ents_per_type = null
227
+ speed = 0.0
228
 
229
  [pretraining]
230
 
231
  [initialize]
232
+ vocab_data = null
233
  vectors = ${paths.vectors}
234
  init_tok2vec = ${paths.init_tok2vec}
235
  before_init = null
da_core_news_sm-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9bf5249f65f81282b477bb1e2c0f152b1c580d8458370e7b08e4b64028f010b3
3
- size 18846549
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfbc3ee87da2c0ae1523f78ce34ff0713684928bac1e0d450598725555acaf5d
3
+ size 19129449
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"da",
3
  "name":"core_news_sm",
4
- "version":"3.1.0",
5
  "description":"Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.1.0,<3.2.0",
11
- "spacy_git_version":"caba63b74",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -183,6 +183,7 @@
183
  "acl:relcl",
184
  "advcl",
185
  "advmod",
 
186
  "amod",
187
  "appos",
188
  "aux",
@@ -206,7 +207,7 @@
206
  "nummod",
207
  "obj",
208
  "obl",
209
- "obl:loc",
210
  "obl:tmod",
211
  "punct",
212
  "xcomp"
@@ -250,59 +251,54 @@
250
  ],
251
  "performance":{
252
  "token_acc":0.9994672349,
253
- "tag_acc":0.952251816,
254
- "pos_acc":0.952251816,
255
- "morph_acc":0.9384987893,
256
- "lemma_acc":0.8491041162,
257
- "dep_uas":0.7983240223,
258
- "dep_las":0.7531843575,
259
- "ents_p":0.7439824945,
260
- "ents_r":0.7083333333,
261
- "ents_f":0.7257203842,
262
- "sents_p":0.8375,
263
- "sents_r":0.8315602837,
264
- "sents_f":0.834519573,
265
- "speed":11486.3761387023,
266
  "morph_per_feat":{
267
  "Mood":{
268
- "p":0.9675881792,
269
- "r":0.9675881792,
270
- "f":0.9675881792
271
  },
272
  "Tense":{
273
- "p":0.9540316503,
274
- "r":0.953313253,
275
- "f":0.9536723164
276
  },
277
  "VerbForm":{
278
- "p":0.9462631254,
279
- "r":0.9375764994,
280
- "f":0.9418997848
281
  },
282
  "Voice":{
283
- "p":0.9736445783,
284
- "r":0.966367713,
285
- "f":0.9699924981
286
  },
287
  "Definite":{
288
- "p":0.9573954984,
289
- "r":0.9411299881,
290
- "f":0.9491930663
291
  },
292
  "Gender":{
293
- "p":0.9379194631,
294
- "r":0.9288800266,
295
- "f":0.9333778594
296
  },
297
  "Number":{
298
- "p":0.9533227848,
299
- "r":0.9428794992,
300
- "f":0.9480723839
301
  },
302
  "AdpType":{
303
  "p":1.0,
304
- "r":0.9902740937,
305
- "f":0.995113283
306
  },
307
  "PartType":{
308
  "p":1.0,
@@ -310,39 +306,44 @@
310
  "f":0.9983739837
311
  },
312
  "Case":{
313
- "p":0.9741935484,
314
- "r":0.9541864139,
315
- "f":0.9640861931
316
  },
317
  "Person":{
318
- "p":0.9787610619,
319
- "r":0.9822380107,
320
- "f":0.9804964539
321
  },
322
  "PronType":{
323
- "p":0.98195242,
324
- "r":0.984375,
325
- "f":0.9831622177
326
  },
327
  "NumType":{
328
- "p":0.986013986,
329
- "r":0.9337748344,
330
- "f":0.9591836735
331
  },
332
  "Degree":{
333
- "p":0.9312039312,
334
- "r":0.913253012,
335
- "f":0.9221411192
336
  },
337
  "Reflex":{
338
  "p":1.0,
339
  "r":1.0,
340
  "f":1.0
341
  },
 
 
 
 
 
342
  "Number[psor]":{
343
- "p":0.9772727273,
344
  "r":1.0,
345
- "f":0.9885057471
346
  },
347
  "Poss":{
348
  "p":1.0,
@@ -350,156 +351,156 @@
350
  "f":1.0
351
  },
352
  "Foreign":{
353
- "p":1.0,
354
- "r":0.4,
355
- "f":0.5714285714
356
  },
357
  "Abbr":{
358
- "p":0.0,
359
- "r":0.0,
360
- "f":0.0
361
  },
362
  "Style":{
363
  "p":1.0,
364
  "r":1.0,
365
  "f":1.0
366
- },
367
- "Polite":{
368
- "p":1.0,
369
- "r":0.5,
370
- "f":0.6666666667
371
  }
372
  },
 
 
 
 
 
373
  "dep_las_per_type":{
374
  "advmod":{
375
- "p":0.6757865937,
376
- "r":0.697740113,
377
- "f":0.6865879083
378
  },
379
  "root":{
380
- "p":0.7860962567,
381
- "r":0.7819148936,
382
- "f":0.784
383
  },
384
  "nsubj":{
385
- "p":0.8165236052,
386
- "r":0.802742616,
387
- "f":0.8095744681
388
  },
389
  "case":{
390
- "p":0.8872403561,
391
- "r":0.8863636364,
392
- "f":0.8868017795
393
  },
394
  "obl":{
395
- "p":0.6839546191,
396
- "r":0.6562986003,
397
- "f":0.6698412698
398
  },
399
  "cc":{
400
- "p":0.7485714286,
401
- "r":0.761627907,
402
- "f":0.7550432277
403
  },
404
  "conj":{
405
- "p":0.5506493506,
406
- "r":0.5653333333,
407
- "f":0.5578947368
408
  },
409
  "obj":{
410
- "p":0.7422303473,
411
- "r":0.7883495146,
412
- "f":0.7645951036
413
  },
414
  "aux":{
415
- "p":0.8738738739,
416
- "r":0.8483965015,
417
- "f":0.8609467456
418
  },
419
  "acl:relcl":{
420
- "p":0.6272189349,
421
- "r":0.572972973,
422
- "f":0.5988700565
423
  },
424
- "obl:loc":{
425
- "p":0.6825396825,
426
- "r":0.6142857143,
427
- "f":0.6466165414
428
  },
429
  "det":{
430
- "p":0.8899835796,
431
- "r":0.8929159802,
432
- "f":0.8914473684
433
  },
434
  "amod":{
435
- "p":0.766721044,
436
- "r":0.8020477816,
437
- "f":0.7839866555
438
  },
439
  "nmod:poss":{
440
- "p":0.7052631579,
441
- "r":0.6633663366,
442
- "f":0.6836734694
443
  },
444
  "ccomp":{
445
- "p":0.5362318841,
446
- "r":0.5967741935,
447
- "f":0.5648854962
448
  },
449
  "nummod":{
450
- "p":0.8196721311,
451
- "r":0.8333333333,
452
- "f":0.826446281
453
  },
454
  "flat":{
455
- "p":0.7865853659,
456
- "r":0.8543046358,
457
- "f":0.819047619
458
  },
459
  "compound:prt":{
460
- "p":0.56,
461
- "r":0.3414634146,
462
- "f":0.4242424242
463
  },
464
  "advcl":{
465
- "p":0.5508474576,
466
- "r":0.5603448276,
467
- "f":0.5555555556
468
  },
469
  "mark":{
470
- "p":0.8445378151,
471
- "r":0.8254620123,
472
- "f":0.8348909657
473
  },
474
  "cop":{
475
- "p":0.7526315789,
476
- "r":0.8171428571,
477
- "f":0.7835616438
478
  },
479
  "dep":{
480
- "p":0.1772151899,
481
  "r":0.2641509434,
482
- "f":0.2121212121
483
  },
484
  "nmod":{
485
- "p":0.6222222222,
486
- "r":0.6015625,
487
- "f":0.6117179742
488
  },
489
  "iobj":{
490
- "p":0.7058823529,
491
- "r":0.5454545455,
492
- "f":0.6153846154
493
  },
494
  "xcomp":{
495
- "p":0.475,
496
- "r":0.3220338983,
497
- "f":0.3838383838
498
  },
499
  "list":{
500
- "p":0.2941176471,
501
- "r":0.2777777778,
502
- "f":0.2857142857
503
  },
504
  "vocative":{
505
  "p":0.0,
@@ -507,24 +508,29 @@
507
  "f":0.0
508
  },
509
  "fixed":{
510
- "p":0.9189189189,
511
- "r":0.8095238095,
512
- "f":0.8607594937
 
 
 
 
 
513
  },
514
  "expl":{
515
- "p":0.8387096774,
516
- "r":0.7647058824,
517
- "f":0.8
518
  },
519
  "appos":{
520
- "p":0.5,
521
- "r":0.4242424242,
522
- "f":0.4590163934
523
  },
524
  "obl:tmod":{
525
- "p":0.625,
526
- "r":0.2777777778,
527
- "f":0.3846153846
528
  },
529
  "discourse":{
530
  "p":0.0,
@@ -532,32 +538,38 @@
532
  "f":0.0
533
  }
534
  },
 
 
 
 
 
535
  "ents_per_type":{
536
  "PER":{
537
- "p":0.7716049383,
538
- "r":0.7530120482,
539
- "f":0.762195122
540
  },
541
  "ORG":{
542
- "p":0.7105263158,
543
- "r":0.6,
544
- "f":0.6506024096
545
  },
546
  "MISC":{
547
- "p":0.6517857143,
548
- "r":0.6460176991,
549
- "f":0.6488888889
550
  },
551
  "LOC":{
552
- "p":0.8224299065,
553
- "r":0.7927927928,
554
- "f":0.8073394495
555
  }
556
- }
 
557
  },
558
  "sources":[
559
  {
560
- "name":"UD Danish DDT v2.5",
561
  "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
562
  "license":"CC BY-SA 4.0",
563
  "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
1
  {
2
  "lang":"da",
3
  "name":"core_news_sm",
4
+ "version":"3.2.0",
5
  "description":"Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.2.0,<3.3.0",
11
+ "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
183
  "acl:relcl",
184
  "advcl",
185
  "advmod",
186
+ "advmod:lmod",
187
  "amod",
188
  "appos",
189
  "aux",
207
  "nummod",
208
  "obj",
209
  "obl",
210
+ "obl:lmod",
211
  "obl:tmod",
212
  "punct",
213
  "xcomp"
251
  ],
252
  "performance":{
253
  "token_acc":0.9994672349,
254
+ "token_p":0.9977732598,
255
+ "token_r":0.9974835463,
256
+ "token_f":0.997628382,
257
+ "pos_acc":0.9507990315,
258
+ "morph_acc":0.9371428571,
259
+ "morph_micro_p":0.9563225412,
260
+ "morph_micro_r":0.9482610671,
261
+ "morph_micro_f":0.9522747434,
 
 
 
 
 
262
  "morph_per_feat":{
263
  "Mood":{
264
+ "p":0.9639468691,
265
+ "r":0.9685414681,
266
+ "f":0.9662387066
267
  },
268
  "Tense":{
269
+ "p":0.9529147982,
270
+ "r":0.9600903614,
271
+ "f":0.9564891223
272
  },
273
  "VerbForm":{
274
+ "p":0.9471419791,
275
+ "r":0.9430844553,
276
+ "f":0.9451088623
277
  },
278
  "Voice":{
279
+ "p":0.968492123,
280
+ "r":0.9648729447,
281
+ "f":0.9666791464
282
  },
283
  "Definite":{
284
+ "p":0.9468212715,
285
+ "r":0.9355985776,
286
+ "f":0.9411764706
287
  },
288
  "Gender":{
289
+ "p":0.9299128102,
290
+ "r":0.9215686275,
291
+ "f":0.9257219162
292
  },
293
  "Number":{
294
+ "p":0.9519535375,
295
+ "r":0.9405320814,
296
+ "f":0.9462083443
297
  },
298
  "AdpType":{
299
  "p":1.0,
300
+ "r":0.9840848806,
301
+ "f":0.9919786096
302
  },
303
  "PartType":{
304
  "p":1.0,
306
  "f":0.9983739837
307
  },
308
  "Case":{
309
+ "p":0.9727126806,
310
+ "r":0.9573459716,
311
+ "f":0.9649681529
312
  },
313
  "Person":{
314
+ "p":0.9770723104,
315
+ "r":0.9840142096,
316
+ "f":0.9805309735
317
  },
318
  "PronType":{
319
+ "p":0.9826875515,
320
+ "r":0.9802631579,
321
+ "f":0.9814738576
322
  },
323
  "NumType":{
324
+ "p":0.9793103448,
325
+ "r":0.940397351,
326
+ "f":0.9594594595
327
  },
328
  "Degree":{
329
+ "p":0.9402985075,
330
+ "r":0.9108433735,
331
+ "f":0.9253365973
332
  },
333
  "Reflex":{
334
  "p":1.0,
335
  "r":1.0,
336
  "f":1.0
337
  },
338
+ "Polite":{
339
+ "p":0.75,
340
+ "r":0.75,
341
+ "f":0.75
342
+ },
343
  "Number[psor]":{
344
+ "p":0.9885057471,
345
  "r":1.0,
346
+ "f":0.9942196532
347
  },
348
  "Poss":{
349
  "p":1.0,
351
  "f":1.0
352
  },
353
  "Foreign":{
354
+ "p":0.8333333333,
355
+ "r":0.5,
356
+ "f":0.625
357
  },
358
  "Abbr":{
359
+ "p":1.0,
360
+ "r":0.2,
361
+ "f":0.3333333333
362
  },
363
  "Style":{
364
  "p":1.0,
365
  "r":1.0,
366
  "f":1.0
 
 
 
 
 
367
  }
368
  },
369
+ "sents_p":0.9045045045,
370
+ "sents_r":0.890070922,
371
+ "sents_f":0.8972296693,
372
+ "dep_uas":0.8070861741,
373
+ "dep_las":0.7640549905,
374
  "dep_las_per_type":{
375
  "advmod":{
376
+ "p":0.6948682386,
377
+ "r":0.7076271186,
378
+ "f":0.7011896431
379
  },
380
  "root":{
381
+ "p":0.8078994614,
382
+ "r":0.7978723404,
383
+ "f":0.8028545941
384
  },
385
  "nsubj":{
386
+ "p":0.8367129136,
387
+ "r":0.8270042194,
388
+ "f":0.8318302387
389
  },
390
  "case":{
391
+ "p":0.8841584158,
392
+ "r":0.8806706114,
393
+ "f":0.8824110672
394
  },
395
  "obl":{
396
+ "p":0.6810207337,
397
+ "r":0.6630434783,
398
+ "f":0.6719118804
399
  },
400
  "cc":{
401
+ "p":0.7620396601,
402
+ "r":0.7819767442,
403
+ "f":0.7718794835
404
  },
405
  "conj":{
406
+ "p":0.6066481994,
407
+ "r":0.584,
408
+ "f":0.5951086957
409
  },
410
  "obj":{
411
+ "p":0.7678244973,
412
+ "r":0.8155339806,
413
+ "f":0.790960452
414
  },
415
  "aux":{
416
+ "p":0.875,
417
+ "r":0.8571428571,
418
+ "f":0.8659793814
419
  },
420
  "acl:relcl":{
421
+ "p":0.5852272727,
422
+ "r":0.5567567568,
423
+ "f":0.5706371191
424
  },
425
+ "advmod:lmod":{
426
+ "p":0.7042253521,
427
+ "r":0.7462686567,
428
+ "f":0.7246376812
429
  },
430
  "det":{
431
+ "p":0.903814262,
432
+ "r":0.8978583196,
433
+ "f":0.9008264463
434
  },
435
  "amod":{
436
+ "p":0.7807757167,
437
+ "r":0.7901023891,
438
+ "f":0.7854113656
439
  },
440
  "nmod:poss":{
441
+ "p":0.7083333333,
442
+ "r":0.6732673267,
443
+ "f":0.6903553299
444
  },
445
  "ccomp":{
446
+ "p":0.6212121212,
447
+ "r":0.6612903226,
448
+ "f":0.640625
449
  },
450
  "nummod":{
451
+ "p":0.7868852459,
452
+ "r":0.8,
453
+ "f":0.7933884298
454
  },
455
  "flat":{
456
+ "p":0.7804878049,
457
+ "r":0.8476821192,
458
+ "f":0.8126984127
459
  },
460
  "compound:prt":{
461
+ "p":0.652173913,
462
+ "r":0.3658536585,
463
+ "f":0.46875
464
  },
465
  "advcl":{
466
+ "p":0.6339285714,
467
+ "r":0.6120689655,
468
+ "f":0.6228070175
469
  },
470
  "mark":{
471
+ "p":0.8773784355,
472
+ "r":0.8521560575,
473
+ "f":0.8645833333
474
  },
475
  "cop":{
476
+ "p":0.7700534759,
477
+ "r":0.8228571429,
478
+ "f":0.7955801105
479
  },
480
  "dep":{
481
+ "p":0.1647058824,
482
  "r":0.2641509434,
483
+ "f":0.2028985507
484
  },
485
  "nmod":{
486
+ "p":0.6058252427,
487
+ "r":0.609375,
488
+ "f":0.6075949367
489
  },
490
  "iobj":{
491
+ "p":0.8181818182,
492
+ "r":0.4090909091,
493
+ "f":0.5454545455
494
  },
495
  "xcomp":{
496
+ "p":0.431372549,
497
+ "r":0.3728813559,
498
+ "f":0.4
499
  },
500
  "list":{
501
+ "p":0.3,
502
+ "r":0.1666666667,
503
+ "f":0.2142857143
504
  },
505
  "vocative":{
506
  "p":0.0,
508
  "f":0.0
509
  },
510
  "fixed":{
511
+ "p":0.85,
512
+ "r":0.8292682927,
513
+ "f":0.8395061728
514
+ },
515
+ "obl:lmod":{
516
+ "p":0.0,
517
+ "r":0.0,
518
+ "f":0.0
519
  },
520
  "expl":{
521
+ "p":0.7941176471,
522
+ "r":0.7941176471,
523
+ "f":0.7941176471
524
  },
525
  "appos":{
526
+ "p":0.4333333333,
527
+ "r":0.3939393939,
528
+ "f":0.4126984127
529
  },
530
  "obl:tmod":{
531
+ "p":0.8571428571,
532
+ "r":0.3333333333,
533
+ "f":0.48
534
  },
535
  "discourse":{
536
  "p":0.0,
538
  "f":0.0
539
  }
540
  },
541
+ "tag_acc":0.9507990315,
542
+ "lemma_acc":0.8491041162,
543
+ "ents_p":0.7682119205,
544
+ "ents_r":0.725,
545
+ "ents_f":0.7459807074,
546
  "ents_per_type":{
547
  "PER":{
548
+ "p":0.8089171975,
549
+ "r":0.765060241,
550
+ "f":0.786377709
551
  },
552
  "ORG":{
553
+ "p":0.7215189873,
554
+ "r":0.6333333333,
555
+ "f":0.674556213
556
  },
557
  "MISC":{
558
+ "p":0.6782608696,
559
+ "r":0.6902654867,
560
+ "f":0.6842105263
561
  },
562
  "LOC":{
563
+ "p":0.8431372549,
564
+ "r":0.7747747748,
565
+ "f":0.8075117371
566
  }
567
+ },
568
+ "speed":10057.2129225514
569
  },
570
  "sources":[
571
  {
572
+ "name":"UD Danish DDT v2.8",
573
  "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
574
  "license":"CC BY-SA 4.0",
575
  "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
morphologizer/cfg CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "labels_morph":{
3
  "AdpType=Prep|POS=ADP":"AdpType=Prep",
4
  "Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Com|Number=Sing",
@@ -316,5 +317,6 @@
316
  "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
317
  "POS=DET|PronType=Dem":90,
318
  "Definite=Def|Number=Plur|POS=NOUN":92
319
- }
 
320
  }
1
  {
2
+ "extend":false,
3
  "labels_morph":{
4
  "AdpType=Prep|POS=ADP":"AdpType=Prep",
5
  "Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Com|Number=Sing",
317
  "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
318
  "POS=DET|PronType=Dem":90,
319
  "Definite=Def|Number=Plur|POS=NOUN":92
320
+ },
321
+ "overwrite":true
322
  }
morphologizer/model CHANGED
Binary files a/morphologizer/model and b/morphologizer/model differ
ner/model CHANGED
Binary files a/ner/model and b/ner/model differ
parser/model CHANGED
Binary files a/parser/model and b/parser/model differ
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�2{"0":{"":41514},"1":{"":34295},"2":{"case":7489,"nsubj":6009,"det":4334,"amod":3968,"advmod":3657,"mark":3529,"aux":2432,"cc":2261,"punct":2182,"cop":1329,"obl":894,"nummod":799,"nmod:poss":651,"nmod":460,"expl":291,"ccomp":202,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":49,"acl:relcl":43},"3":{"punct":8601,"obl":3949,"obj":3758,"nmod":3565,"conj":2745,"advmod":2095,"flat":1295,"nsubj":1172,"acl:relcl":1131,"advcl":808,"amod":628,"obl:loc":467,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":167,"list":161,"nmod:poss":156,"punct||conj":151,"mark":137,"cc":135,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4367}}�cfg��neg_key�
1
+ ��moves�D{"0":{"":41514},"1":{"":34295},"2":{"case":7489,"nsubj":6009,"det":4334,"amod":3968,"advmod":3657,"mark":3529,"aux":2432,"cc":2261,"punct":2182,"cop":1329,"obl":894,"nummod":799,"nmod:poss":651,"nmod":460,"expl":291,"ccomp":202,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":49,"acl:relcl":43},"3":{"punct":8601,"obl":3949,"obj":3758,"nmod":3565,"conj":2745,"advmod":2095,"flat":1295,"nsubj":1172,"acl:relcl":1131,"advcl":808,"amod":628,"advmod:lmod":423,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":167,"list":161,"nmod:poss":156,"punct||conj":151,"mark":137,"cc":135,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"obl:lmod":44,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4367}}�cfg��neg_key�
senter/cfg CHANGED
@@ -1,3 +1,3 @@
1
  {
2
-
3
  }
1
  {
2
+ "overwrite":false
3
  }
senter/model CHANGED
Binary files a/senter/model and b/senter/model differ
tok2vec/model CHANGED
Binary files a/tok2vec/model and b/tok2vec/model differ
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:19125b53329443c67b0fdae8030451f8c33ca100e8ca35802c30adae739cf8a8
3
- size 459574
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:786ff7139c6dd7568c66e2ae810f42fa3afc33860aa3d81bd5dfeb263295d80c
3
+ size 459696
vocab/vectors.cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ {
2
+ "mode":"default"
3
+ }