Update spaCy pipeline

Browse files

Files changed (10) hide show

README.md +193 -1
config.cfg +1 -2
da_dacy_small_trf-any-py3-none-any.whl +2 -2
meta.json +218 -192
morphologizer/model +1 -1
ner/model +1 -1
parser/model +1 -1
transformer/model/pytorch_model.bin +1 -1
transformer/model/tokenizer_config.json +1 -1
vocab/strings.json +2 -2

README.md CHANGED Viewed

	@@ -1 +1,193 @@
1	- ~~This is currently a placeholder model.~~

+---
+tags:
+- spacy
+- token-classification
+language:
+- da
+license: Apache-2.0-License
+model-index:
+- name: da_dacy_small_trf
+  results:
+  - tasks:
+      name: NER
+      type: token-classification
+      metrics:
+      - name: Precision
+        type: precision
+        value: 0.81724846
+      - name: Recall
+        type: recall
+        value: 0.8291666667
+      - name: F Score
+        type: f_score
+        value: 0.8231644261
+  - tasks:
+      name: SENTER
+      type: token-classification
+      metrics:
+      - name: Precision
+        type: precision
+        value: 0.8603839442
+      - name: Recall
+        type: recall
+        value: 0.8741134752
+      - name: F Score
+        type: f_score
+        value: 0.8671943712
+  - tasks:
+      name: UNLABELED_DEPENDENCIES
+      type: token-classification
+      metrics:
+      - name: Accuracy
+        type: accuracy
+        value: 0.8492442546
+  - tasks:
+      name: LABELED_DEPENDENCIES
+      type: token-classification
+      metrics:
+      - name: Accuracy
+        type: accuracy
+        value: 0.8492442546
+---
+<a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
+# DaCy small transformer
+DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.
+DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency
+parsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
+DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
+| Feature | Description |
+| --- | --- |
+| **Name** | `da_dacy_small_trf` |
+| **Version** | `0.1.0` |
+| **spaCy** | `>=3.1.1,<3.2.0` |
+| **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
+| **Components** | `transformer`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
+| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
+| **Sources** | [UD Danish DDT v2.5](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Maltehb/-l-ctra-danish-electra-small-cased](https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-cased) (Malte Højmark-Bertelsen) |
+| **License** | `Apache-2.0 License` |
+| **Author** | [Centre for Humanities Computing Aarhus](https://chcaa.io/#/) |
+### Label Scheme
+<details>
+<summary>View label scheme (192 labels for 3 components)</summary>
+| Component | Labels |
+| --- | --- |
+| **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
+| **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:loc`, `obl:tmod`, `punct`, `xcomp` |
+| **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
+</details>
+### Accuracy
+| Type | Score |
+| --- | --- |
+| `POS_ACC` | 95.83 |
+| `MORPH_ACC` | 95.70 |
+| `DEP_UAS` | 84.92 |
+| `DEP_LAS` | 81.76 |
+| `SENTS_P` | 86.04 |
+| `SENTS_R` | 87.41 |
+| `SENTS_F` | 86.72 |
+| `LEMMA_ACC` | 84.91 |
+| `ENTS_F` | 82.32 |
+| `ENTS_P` | 81.72 |
+| `ENTS_R` | 82.92 |
+| `TRANSFORMER_LOSS` | 41746686.63 |
+| `MORPHOLOGIZER_LOSS` | 3458966.49 |
+| `PARSER_LOSS` | 15104898.38 |
+| `NER_LOSS` | 546098.45 |
+## Bias and Robustness
+Besides the validation done by SpaCy on the DaNE testset, DaCy also provides a series of augmentations to the DaNE test set to see how well the models deal with these types of augmentations.
+The can be seen as behavioural probes akinn to the NLP checklist.
+### Deterministic Augmentations
+Deterministic augmentations are augmentation which always yield the same result.
+| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) | Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |
+| --- | --- | --- | --- |  --- | --- | --- |  --- |
+| No augmentation | 0.98 | 0.974 | 0.868 |  0.836 | 0.936 | 0.844 | 0.765 |
+| Æøå Augmentation | 0.955 | 0.948 | 0.823 |  0.783 | 0.922 | 0.754 | 0.718 |
+| Lowercase | 0.974 | 0.97 | 0.862 |  0.828 | 0.905 | 0.848 | 0.681 |
+| No Spacing | 0.229 | 0.229 | 0.004 |  0.003 | 0.824 | 0.225 | 0.048 |
+| Abbreviated first names | 0.979 | 0.973 | 0.864 |  0.832 | 0.94 | 0.845 | 0.699 |
+| Input size augmentation 5 sentences | 0.956 | 0.956 | 0.851 |  0.818 | 0.883 | 0.844 | 0.743 |
+| Input size augmentation 10 sentences | 0.959 | 0.958 | 0.853 |  0.821 | 0.897 | 0.844 | 0.755 |
+### Stochastic Augmentations
+Stochastic augmentations are augmentation which are repeated mulitple times to estimate the effect of the augmentation.
+| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) | Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |
+| --- | --- | --- | --- |  --- | --- | --- |  --- |
+| Keystroke errors 2% | 0.931 (0.003) | 0.929 (0.003) | 0.797 (0.003) |  0.753 (0.003) | 0.884 (0.003) | 0.772 (0.003) | 0.657 (0.003) |
+| Keystroke errors 5% | 0.859 (0.003) | 0.863 (0.003) | 0.699 (0.003) |  0.641 (0.003) | 0.824 (0.003) | 0.681 (0.003) | 0.53 (0.003) |
+| Keystroke errors 15% | 0.633 (0.006) | 0.662 (0.006) | 0.439 (0.006) |  0.358 (0.006) | 0.688 (0.006) | 0.459 (0.006) | 0.293 (0.006) |
+| Danish names | 0.979 (0.0) | 0.974 (0.0) | 0.867 (0.0) |  0.835 (0.0) | 0.943 (0.0) | 0.847 (0.0) | 0.748 (0.0) |
+| Muslim names | 0.979 (0.0) | 0.974 (0.0) | 0.865 (0.0) |  0.833 (0.0) | 0.94 (0.0) | 0.847 (0.0) | 0.732 (0.0) |
+| Female names | 0.979 (0.0) | 0.974 (0.0) | 0.867 (0.0) |  0.835 (0.0) | 0.946 (0.0) | 0.847 (0.0) | 0.754 (0.0) |
+| Male names | 0.979 (0.0) | 0.974 (0.0) | 0.867 (0.0) |  0.835 (0.0) | 0.943 (0.0) | 0.847 (0.0) | 0.748 (0.0) |
+| Spacing Augmention 5% | 0.941 (0.002) | 0.936 (0.002) | 0.755 (0.002) |  0.725 (0.002) | 0.907 (0.002) | 0.811 (0.002) | 0.699 (0.002) |
+<details>
+<summary> Description of Augmenters </summary>
+**No augmentation:**
+Applies no augmentation to the DaNE test set.
+**Æøå Augmentation:**
+This augmentation replace the æ,ø, and å with their spelling variations ae, oe and aa respectively.
+**Lowercase:**
+This augmentation lowercases all text.
+**No Spacing:**
+This augmentation removed all spacing from the text.
+**Abbreviated first names:**
+This agmentation abbreviates the first names of entities. For instance 'Kenneth Enevoldsen' would turn to 'K. Enevoldsen'.
+**Keystroke errors 2%:**
+This agmentation simulate keystroke errors by replacing 2% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Keystroke errors 5%:**
+This agmentation simulate keystroke errors by replacing 5% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Keystroke errors 15%:**
+This agmentation simulate keystroke errors by replacing 15% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Danish names:**
+This agmentation replace all names with Danish names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Muslim names:**
+This agmentation replace all names with Muslim names derived from  Meldgaard (2005). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Female names:**
+This agmentation replace all names with Danish female names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Male names:**
+This agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+**Spacing Augmention 5%:**
+This agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
+ </details>
+ <br />
+### Hardware
+This was run an trained on a Quadro RTX 8000 GPU.

config.cfg CHANGED Viewed

@@ -104,7 +104,6 @@ stride = 96
 [components.transformer.model.tokenizer_config]
 use_fast = true
-strip_accents = false
 [corpora]
@@ -136,7 +135,7 @@ dropout = 0.1
 accumulate_gradient = 3
 patience = 5000
 max_epochs = 0
-max_steps = 1
 eval_frequency = 1000
 frozen_components = []
 before_to_disk = null

 [components.transformer.model.tokenizer_config]
 use_fast = true
 [corpora]
 accumulate_gradient = 3
 patience = 5000
 max_epochs = 0
+max_steps = 40000
 eval_frequency = 1000
 frozen_components = []
 before_to_disk = null

da_dacy_small_trf-any-py3-none-any.whl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:58343f2fc5b2d62c3863e843dcf9987afabad285ae19256b7dbe8acb7dd6df2d
-size 57359279

 version https://git-lfs.github.com/spec/v1
+oid sha256:9a76f9af63a196fccfc13b6dab46ef46ac1ba1202c15ad38b7189b07ee6e62be
+size 57514565

meta.json CHANGED Viewed

@@ -2,13 +2,13 @@
   "lang":"da",
   "name":"dacy_small_trf",
   "version":"0.1.0",
-  "description":"DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines. DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency parsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. This repository also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.",
-  "author":"Kenneth Enevoldsen",
-  "email":"kenneth.enevoldsen@cas.au.dk",
-  "url":"https://centre-for-humanities-computing.github.io/DaCy/",
   "license":"Apache-2.0 License",
-  "spacy_version":">=3.1.0,<3.2.0",
-  "spacy_git_version":"530b5d72f",
   "vectors":{
     "width":0,
     "vectors":0,
@@ -243,248 +243,251 @@
   "disabled":[
   ],
   "performance":{
-    "pos_acc":0.1150828248,
-    "morph_acc":0.1095611741,
     "morph_per_feat":{
       "Mood":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Tense":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "VerbForm":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Voice":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Definite":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Gender":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Number":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "AdpType":{
-        "p":0.1556564822,
-        "r":1.0,
-        "f":0.2693819221
       },
       "PartType":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Case":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Person":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "PronType":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "NumType":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Degree":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Reflex":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Number[psor]":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Poss":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Foreign":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Abbr":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Style":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "Polite":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       }
     },
-    "dep_uas":0.1536466438,
-    "dep_las":0.0261424348,
     "dep_las_per_type":{
       "advmod":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "root":{
-        "p":0.0578034682,
-        "r":0.2659574468,
-        "f":0.0949667616
       },
       "nsubj":{
-        "p":0.0226640159,
-        "r":0.0601265823,
-        "f":0.032919434
       },
       "case":{
-        "p":0.0623268698,
-        "r":0.0444664032,
-        "f":0.0519031142
       },
       "obl":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "cc":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "conj":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "obj":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "aux":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "acl:relcl":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "obl:loc":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "det":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "amod":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "nmod:poss":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "ccomp":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "nummod":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "flat":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "compound:prt":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "advcl":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "mark":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "cop":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "dep":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "nmod":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "iobj":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
-      },
-      "list":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "xcomp":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "vocative":{
         "p":0.0,
@@ -492,24 +495,24 @@
         "f":0.0
       },
       "fixed":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
-      },
-      "appos":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "expl":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "obl:tmod":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "discourse":{
         "p":0.0,
@@ -517,39 +520,62 @@
         "f":0.0
       }
     },
-    "sents_p":0.0007698229,
-    "sents_r":0.0035460993,
-    "sents_f":0.0012650221,
     "lemma_acc":0.8491041162,
-    "ents_f":0.0076157001,
-    "ents_p":0.0040957782,
-    "ents_r":0.0541666667,
     "ents_per_type":{
-      "ORG":{
-        "p":0.0040957782,
-        "r":0.2888888889,
-        "f":0.0080770426
-      },
       "PER":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "MISC":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       },
       "LOC":{
-        "p":0.0,
-        "r":0.0,
-        "f":0.0
       }
     },
-    "transformer_loss":0.0,
-    "morphologizer_loss":274.2421875,
-    "parser_loss":424.0338255167,
-    "ner_loss":262.4495080709
   },
-  "notes":"This is a test"
 }

   "lang":"da",
   "name":"dacy_small_trf",
   "version":"0.1.0",
+  "description":"\n<a href=\"https://github.com/centre-for-humanities-computing/Dacy\"><img src=\"https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png\" width=\"175\" height=\"175\" align=\"right\" /></a>\n\n# DaCy small transformer\n\nDaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.\nDaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency \nparsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. \nDaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.\n    ",
+  "author":"Centre for Humanities Computing Aarhus",
+  "email":"Kenneth.enevoldsen@cas.au.dk",
+  "url":"https://chcaa.io/#/",
   "license":"Apache-2.0 License",
+  "spacy_version":">=3.1.1,<3.2.0",
+  "spacy_git_version":"ffaead8fe",
   "vectors":{
     "width":0,
     "vectors":0,
   "disabled":[
   ],
+  "_sourced_vectors_hashes":{
+  },
   "performance":{
+    "pos_acc":0.9583030655,
+    "morph_acc":0.9570439246,
     "morph_per_feat":{
       "Mood":{
+        "p":0.9950690335,
+        "r":0.9618684461,
+        "f":0.9781871062
       },
       "Tense":{
+        "p":0.9859922179,
+        "r":0.9540662651,
+        "f":0.9697665519
       },
       "VerbForm":{
+        "p":0.9823343849,
+        "r":0.952876377,
+        "f":0.9673811743
       },
       "Voice":{
+        "p":0.9938414165,
+        "r":0.9648729447,
+        "f":0.9791429655
       },
       "Definite":{
+        "p":0.9872480461,
+        "r":0.9482418017,
+        "f":0.9673518742
       },
       "Gender":{
+        "p":0.9793956044,
+        "r":0.9478231971,
+        "f":0.9633507853
       },
       "Number":{
+        "p":0.985179197,
+        "r":0.9535732916,
+        "f":0.9691186216
       },
       "AdpType":{
+        "p":1.0,
+        "r":0.9752431477,
+        "f":0.9874664279
       },
       "PartType":{
+        "p":1.0,
+        "r":0.9675324675,
+        "f":0.9834983498
       },
       "Case":{
+        "p":0.9934640523,
+        "r":0.9605055292,
+        "f":0.9767068273
       },
       "Person":{
+        "p":0.9908925319,
+        "r":0.9662522202,
+        "f":0.9784172662
       },
       "PronType":{
+        "p":0.9941077441,
+        "r":0.9712171053,
+        "f":0.9825291181
       },
       "NumType":{
+        "p":0.9791666667,
+        "r":0.9337748344,
+        "f":0.9559322034
       },
       "Degree":{
+        "p":0.9726708075,
+        "r":0.943373494,
+        "f":0.9577981651
       },
       "Reflex":{
+        "p":1.0,
+        "r":1.0,
+        "f":1.0
       },
       "Number[psor]":{
+        "p":1.0,
+        "r":0.988372093,
+        "f":0.9941520468
       },
       "Poss":{
+        "p":1.0,
+        "r":0.9772727273,
+        "f":0.9885057471
       },
       "Foreign":{
+        "p":0.8888888889,
+        "r":0.8,
+        "f":0.8421052632
       },
       "Abbr":{
+        "p":1.0,
+        "r":0.4,
+        "f":0.5714285714
       },
       "Style":{
+        "p":1.0,
+        "r":1.0,
+        "f":1.0
       },
       "Polite":{
+        "p":0.3333333333,
+        "r":0.25,
+        "f":0.2857142857
       }
     },
+    "dep_uas":0.8492442546,
+    "dep_las":0.8176199573,
     "dep_las_per_type":{
       "advmod":{
+        "p":0.7724637681,
+        "r":0.7528248588,
+        "f":0.7625178827
       },
       "root":{
+        "p":0.8561403509,
+        "r":0.865248227,
+        "f":0.860670194
       },
       "nsubj":{
+        "p":0.8939393939,
+        "r":0.8713080169,
+        "f":0.8824786325
       },
       "case":{
+        "p":0.9141414141,
+        "r":0.8942687747,
+        "f":0.9040959041
       },
       "obl":{
+        "p":0.7286585366,
+        "r":0.7433903577,
+        "f":0.7359507313
       },
       "cc":{
+        "p":0.8486646884,
+        "r":0.8313953488,
+        "f":0.8399412628
       },
       "conj":{
+        "p":0.671957672,
+        "r":0.6773333333,
+        "f":0.6746347942
       },
       "obj":{
+        "p":0.8560747664,
+        "r":0.8893203883,
+        "f":0.8723809524
       },
       "aux":{
+        "p":0.8885542169,
+        "r":0.860058309,
+        "f":0.8740740741
       },
       "acl:relcl":{
+        "p":0.6936416185,
+        "r":0.6486486486,
+        "f":0.6703910615
       },
       "obl:loc":{
+        "p":0.7222222222,
+        "r":0.7428571429,
+        "f":0.7323943662
       },
       "det":{
+        "p":0.9346733668,
+        "r":0.9192751236,
+        "f":0.926910299
       },
       "amod":{
+        "p":0.8549488055,
+        "r":0.8549488055,
+        "f":0.8549488055
       },
       "nmod:poss":{
+        "p":0.75,
+        "r":0.7128712871,
+        "f":0.730964467
       },
       "ccomp":{
+        "p":0.6885245902,
+        "r":0.6774193548,
+        "f":0.6829268293
       },
       "nummod":{
+        "p":0.8181818182,
+        "r":0.825,
+        "f":0.8215767635
       },
       "flat":{
+        "p":0.8636363636,
+        "r":0.880794702,
+        "f":0.8721311475
       },
       "compound:prt":{
+        "p":0.6551724138,
+        "r":0.4634146341,
+        "f":0.5428571429
       },
       "advcl":{
+        "p":0.6967213115,
+        "r":0.7327586207,
+        "f":0.7142857143
       },
       "mark":{
+        "p":0.9018789144,
+        "r":0.887063655,
+        "f":0.8944099379
       },
       "cop":{
+        "p":0.8514285714,
+        "r":0.8514285714,
+        "f":0.8514285714
       },
       "dep":{
+        "p":0.1960784314,
+        "r":0.3773584906,
+        "f":0.2580645161
       },
       "nmod":{
+        "p":0.7197452229,
+        "r":0.662109375,
+        "f":0.6897253306
       },
       "iobj":{
+        "p":0.7333333333,
+        "r":0.5,
+        "f":0.5945945946
       },
       "xcomp":{
+        "p":0.6315789474,
+        "r":0.406779661,
+        "f":0.4948453608
+      },
+      "list":{
+        "p":0.3636363636,
+        "r":0.2222222222,
+        "f":0.275862069
       },
       "vocative":{
         "p":0.0,
         "f":0.0
       },
       "fixed":{
+        "p":0.8947368421,
+        "r":0.8095238095,
+        "f":0.85
       },
       "expl":{
+        "p":0.9090909091,
+        "r":0.8823529412,
+        "f":0.8955223881
+      },
+      "appos":{
+        "p":0.6097560976,
+        "r":0.7575757576,
+        "f":0.6756756757
       },
       "obl:tmod":{
+        "p":0.8,
+        "r":0.2222222222,
+        "f":0.347826087
       },
       "discourse":{
         "p":0.0,
         "f":0.0
       }
     },
+    "sents_p":0.8603839442,
+    "sents_r":0.8741134752,
+    "sents_f":0.8671943712,
     "lemma_acc":0.8491041162,
+    "ents_f":0.8231644261,
+    "ents_p":0.81724846,
+    "ents_r":0.8291666667,
     "ents_per_type":{
       "PER":{
+        "p":0.9290322581,
+        "r":0.8674698795,
+        "f":0.8971962617
+      },
+      "ORG":{
+        "p":0.7619047619,
+        "r":0.7111111111,
+        "f":0.7356321839
       },
       "MISC":{
+        "p":0.6739130435,
+        "r":0.8230088496,
+        "f":0.7410358566
       },
       "LOC":{
+        "p":0.8818181818,
+        "r":0.8738738739,
+        "f":0.8778280543
       }
     },
+    "transformer_loss":417466.8663170633,
+    "morphologizer_loss":34589.6649030063,
+    "parser_loss":151048.9837691551,
+    "ner_loss":5460.9844742843
   },
+  "sources":[
+    {
+      "name":"UD Danish DDT v2.5",
+      "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
+      "license":"CC BY-SA 4.0",
+      "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
+    },
+    {
+      "name":"DaNE",
+      "url":"https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane",
+      "license":"CC BY-SA 4.0",
+      "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
+    },
+    {
+      "name":"Maltehb/-l-ctra-danish-electra-small-cased",
+      "author":"Malte H\u00f8jmark-Bertelsen",
+      "url":"https://huggingface.co/Maltehb/-l-ctra-danish-electra-small-cased",
+      "license":"CC BY 4.0"
+    }
+  ],
+  "requirements":[
+    "spacy-transformers>=1.0.3,<1.1.0"
+  ],
+  "notes":"\n## Bias and Robustness\n\nBesides the validation done by SpaCy on the DaNE testset, DaCy also provides a series of augmentations to the DaNE test set to see how well the models deal with these types of augmentations.\nThe can be seen as behavioural probes akinn to the NLP checklist.\n\n### Deterministic Augmentations\nDeterministic augmentations are augmentation which always yield the same result.\n\n| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) |\u00a0Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |\n| --- | --- | --- | --- |  --- | --- | --- |  --- |\n| No augmentation | 0.98 | 0.974 | 0.868 |  0.836 | 0.936 | 0.844 | 0.765 |\n| \u00c6\u00f8\u00e5 Augmentation | 0.955 | 0.948 | 0.823 |  0.783 | 0.922 | 0.754 | 0.718 |\n| Lowercase | 0.974 | 0.97 | 0.862 |  0.828 | 0.905 | 0.848 | 0.681 |\n| No Spacing | 0.229 | 0.229 | 0.004 |  0.003 | 0.824 | 0.225 | 0.048 |\n| Abbreviated first names | 0.979 | 0.973 | 0.864 |  0.832 | 0.94 | 0.845 | 0.699 |\n| Input size augmentation 5 sentences | 0.956 | 0.956 | 0.851 |  0.818 | 0.883 | 0.844 | 0.743 |\n| Input size augmentation 10 sentences | 0.959 | 0.958 | 0.853 |  0.821 | 0.897 | 0.844 | 0.755 |\n\n\n\n### Stochastic Augmentations\nStochastic augmentations are augmentation which are repeated mulitple times to estimate the effect of the augmentation.\n\n| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) |\u00a0Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |\n| --- | --- | --- | --- |  --- | --- | --- |  --- |\n| Keystroke errors 2% | 0.931 (0.003) | 0.929 (0.003) | 0.797 (0.003) |  0.753 (0.003) | 0.884 (0.003) | 0.772 (0.003) | 0.657 (0.003) |\n| Keystroke errors 5% | 0.859 (0.003) | 0.863 (0.003) | 0.699 (0.003) |  0.641 (0.003) | 0.824 (0.003) | 0.681 (0.003) | 0.53 (0.003) |\n| Keystroke errors 15% | 0.633 (0.006) | 0.662 (0.006) | 0.439 (0.006) |  0.358 (0.006) | 0.688 (0.006) | 0.459 (0.006) | 0.293 (0.006) |\n| Danish names | 0.979 (0.0) | 0.974 (0.0) | 0.867 (0.0) |  0.835 (0.0) | 0.943 (0.0) | 0.847 (0.0) | 0.748 (0.0) |\n| Muslim names | 0.979 (0.0) | 0.974 (0.0) | 0.865 (0.0) |  0.833 (0.0) | 0.94 (0.0) | 0.847 (0.0) | 0.732 (0.0) |\n| Female names | 0.979 (0.0) | 0.974 (0.0) | 0.867 (0.0) |  0.835 (0.0) | 0.946 (0.0) | 0.847 (0.0) | 0.754 (0.0) |\n| Male names | 0.979 (0.0) | 0.974 (0.0) | 0.867 (0.0) |  0.835 (0.0) | 0.943 (0.0) | 0.847 (0.0) | 0.748 (0.0) |\n| Spacing Augmention 5% | 0.941 (0.002) | 0.936 (0.002) | 0.755 (0.002) |  0.725 (0.002) | 0.907 (0.002) | 0.811 (0.002) | 0.699 (0.002) |\n\n<details>\n\n<summary> Description of Augmenters </summary>\n\n    \n\n**No augmentation:**\nApplies no augmentation to the DaNE test set.\n\n**\u00c6\u00f8\u00e5 Augmentation:**\nThis augmentation replace the \u00e6,\u00f8, and \u00e5 with their spelling variations ae, oe and aa respectively.\n\n**Lowercase:**\nThis augmentation lowercases all text.\n\n**No Spacing:**\nThis augmentation removed all spacing from the text.\n\n**Abbreviated first names:**\nThis agmentation abbreviates the first names of entities. For instance 'Kenneth Enevoldsen' would turn to 'K. Enevoldsen'.\n\n**Keystroke errors 2%:**\nThis agmentation simulate keystroke errors by replacing 2% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Keystroke errors 5%:**\nThis agmentation simulate keystroke errors by replacing 5% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Keystroke errors 15%:**\nThis agmentation simulate keystroke errors by replacing 15% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Danish names:**\nThis agmentation replace all names with Danish names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Muslim names:**\nThis agmentation replace all names with Muslim names derived from  Meldgaard (2005). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Female names:**\nThis agmentation replace all names with Danish female names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Male names:**\nThis agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Spacing Augmention 5%:**\nThis agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n </details> \n <br /> \n\n\n### Hardware\nThis was run an trained on a Quadro RTX 8000 GPU."
 }

morphologizer/model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6593f44c84807d9093ae279607d28e4a3830cac3bd957ffa700d9f1992be852
 size 161992

 version https://git-lfs.github.com/spec/v1
+oid sha256:601cec06d7bb6f1e2025cf6878f5c8fb02d89b5fc71ba82c80e718a28c63f87f
 size 161992

ner/model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cce84a8f6f8737880302491bc844be40901a9e33a7a8091647dd35b087c72ce3
 size 94890

 version https://git-lfs.github.com/spec/v1
+oid sha256:6c7bd95a31a59f7cb632de4a99c12643602828d312d04a7ba233f3bdb7f15778
 size 94890

parser/model CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d3df01ad2eae13f30f68b5ebe46564c0b167ce1026da37052967675a3b7f8438
 size 325085

 version https://git-lfs.github.com/spec/v1
+oid sha256:db9711e97c156d5c9892a65b87d6a185289f74b92dcec527cf6906dfb6e821a6
 size 325085

transformer/model/pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0b7a4a6fc863a3b3bd76b8b040ae7925f5a20d8d6987140b037ca9791b06ac0a
 size 54773654

 version https://git-lfs.github.com/spec/v1
+oid sha256:d65643fe23c672180685635b539688406638af1f7e515cb89505ea7626127400
 size 54773654

transformer/model/tokenizer_config.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"do_lower_case": false, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": ~~false~~, "special_tokens_map_file": null, "full_tokenizer_file": null, "model_max_length": 128, "name_or_path": "Maltehb/-l-ctra-danish-electra-small-cased", "do_basic_tokenize": true, "never_split": null}


1	+ {"do_lower_case": false, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "special_tokens_map_file": null, "full_tokenizer_file": null, "model_max_length": 128, "name_or_path": "Maltehb/-l-ctra-danish-electra-small-cased", "do_basic_tokenize": true, "never_split": null}

vocab/strings.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:86381420bcac876c95ffecbd0b41da7e614440eef239354586defa5b5a5e9735
-size 457618

 version https://git-lfs.github.com/spec/v1
+oid sha256:5b50a86603f748496e4fd87a8aaa203a32bf82d4b3768bf54187ff40de3ca6f9
+size 460120