File size: 4,918 Bytes
a0aa2f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70e8bc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40778ac
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
tags:
- spacy
- token-classification
language:
- da
license: apache-2.0
model-index:
- name: da_dacy_small_DANSK_ner
  results:
  - task:
      name: NER
      type: token-classification
    metrics:
    - name: NER Precision
      type: precision
      value: 0.7718478986
    - name: NER Recall
      type: recall
      value: 0.7728790915
    - name: NER F Score
      type: f_score
      value: 0.7723631509
---

<a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>

# DaCy_small_DANSK_ner

DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analyzing Danish pipelines.
At the time of publishing this model, also included in DaCy encorporates the only models for fine-grained NER using DANSK dataset - a dataset containing 18 annotation types in the same format as Ontonotes.
Moreover, DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency parsing for Danish on the DaNE dataset. 
Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. 
DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
    

| Feature | Description |
| --- | --- |
| **Name** | `da_dacy_small_DANSK_ner` |
| **Version** | `0.1.0` |
| **spaCy** | `>=3.5.0,<3.6.0` |
| **Default Pipeline** | `transformer`, `ner` |
| **Components** | `transformer`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | DANSK - Danish Annotations for NLP Specific TasKs<br />[jonfd/electra-small-nordic](https://huggingface.co/jonfd/electra-small-nordic) (Jón Daðason) |
| **License** | `apache-2.0` |
| **Author** | [Centre for Humanities Computing Aarhus](https://chcaa.io/#/) |

### Label Scheme

<details>

<summary>View label scheme (18 labels for 1 components)</summary>

| Component | Labels |
| --- | --- |
| **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FACILITY`, `GPE`, `LANGUAGE`, `LAW`, `LOCATION`, `MONEY`, `NORP`, `ORDINAL`, `ORGANIZATION`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK OF ART` |

</details>

### Accuracy

| Type | Score |
| --- | --- |
| `ENTS_F` | 77.24 |
| `ENTS_P` | 77.18 |
| `ENTS_R` | 77.29 |
| `TRANSFORMER_LOSS` | 80975.57 |
| `NER_LOSS` | 90852.49 |

### Performance tables

The three tables below show the F1-scores for the three DaCy fine-grained models.

|        Domain        	| DaCy large 	| DaCy medium 	| DaCy small 	|
|:--------------------:	|:----------:	|:-----------:	|:----------:	|
| All domains combined 	|    0.823   	|    0.806    	|    0.776   	|
|     Conversation     	|    0.796   	|    0.718    	|    0.82    	|
|        Dannet        	|    0.75    	|    0.667    	|      1     	|
|         Legal        	|    0.852   	|    0.854    	|    0.866   	|
|         News         	|    0.841   	|    0.759    	|    0.86    	|
|     Social Media     	|    0.793   	|    0.847    	|     0.8    	|
|          Web         	|    0.826   	|    0.802    	|    0.756   	|
|    Wiki and Books    	|    0.778   	|    0.838    	|    0.709   	|

|        Domain        	| DaCy large 	| DaCy medium 	| DaCy small 	|
|:--------------------:	|:----------:	|:-----------:	|:----------:	|
| All domains combined 	|    0.823   	|    0.806    	|    0.776   	|
|     Conversation     	|    0.796   	|    0.718    	|    0.82    	|
|        Dannet        	|    0.75    	|    0.667    	|      1     	|
|         Legal        	|    0.852   	|    0.854    	|    0.866   	|
|         News         	|    0.841   	|    0.759    	|    0.86    	|
|     Social Media     	|    0.793   	|    0.847    	|     0.8    	|
|          Web         	|    0.826   	|    0.802    	|    0.756   	|
|    Wiki and Books    	|    0.778   	|    0.838    	|    0.709   	|

|        Domain        	| DaCy large 	| DaCy medium 	| DaCy small 	|
|:--------------------:	|:----------:	|:-----------:	|:----------:	|
| All domains combined 	|    0.823   	|    0.806    	|    0.776   	|
|     Conversation     	|    0.796   	|    0.718    	|    0.82    	|
|        Dannet        	|    0.75    	|    0.667    	|      1     	|
|         Legal        	|    0.852   	|    0.854    	|    0.866   	|
|         News         	|    0.841   	|    0.759    	|    0.86    	|
|     Social Media     	|    0.793   	|    0.847    	|     0.8    	|
|          Web         	|    0.826   	|    0.802    	|    0.756   	|
|    Wiki and Books    	|    0.778   	|    0.838    	|    0.709   	|


|  | DaCy fine-grained model |  |  |
|:---:|:---:|:---:|:---:|
|  |  | Medium | Small |
| F1-score | 0.823 | 0.806 | 0.776 |
| Recall | 0.834 | 0.818 | 0.77 |
| Precision | 0.813 | 0.794 | 0.781 |