emiltj's picture
Update README.md
4bfc439
---
tags:
- spacy
- token-classification
language:
- da
license: apache-2.0
model-index:
- name: da_dacy_medium_ner_fine_grained
results:
- task:
name: NER
type: token-classification
metrics:
- name: NER Precision
type: precision
value: 0.7937743191
- name: NER Recall
type: recall
value: 0.8176352705
- name: NER F Score
type: f_score
value: 0.8055281343
datasets:
- chcaa/DANSK
---
<a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
# DaCy_medium_ner_fine_grained
DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analyzing Danish pipelines.
At the time of publishing this model, also included in DaCy encorporates the only models for fine-grained NER using DANSK dataset - a dataset containing 18 annotation types in the same format as Ontonotes.
Moreover, DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency parsing for Danish on the DaNE dataset.
Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
For information about the use of this model as well as guides to its use, please refer to [DaCys documentation](https://centre-for-humanities-computing.github.io/DaCy/using_dacy.html).
| Feature | Description |
| --- | --- |
| **Name** | `da_dacy_medium_ner_fine_grained` |
| **Version** | `0.1.0` |
| **spaCy** | `>=3.5.0,<3.6.0` |
| **Default Pipeline** | `transformer`, `ner` |
| **Components** | `transformer`, `ner` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | [DANSK - Danish Annotations for NLP Specific TasKs](https://huggingface.co/datasets/chcaa/DANSK) (chcaa)<br />[vesteinn/DanskBERT](https://huggingface.co/vesteinn/DanskBERT) (Vésteinn Snæbjarnarson) |
| **License** | `apache-2.0` |
| **Author** | [Centre for Humanities Computing Aarhus](https://chcaa.io/#/) |
### Label Scheme
<details>
<summary>View label scheme (18 labels for 1 components)</summary>
| Component | Labels |
| --- | --- |
| **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FACILITY`, `GPE`, `LANGUAGE`, `LAW`, `LOCATION`, `MONEY`, `NORP`, `ORDINAL`, `ORGANIZATION`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK OF ART` |
</details>
### Accuracy
| Type | Score |
| --- | --- |
| `ENTS_F` | 80.55 |
| `ENTS_P` | 79.38 |
| `ENTS_R` | 81.76 |
| `TRANSFORMER_LOSS` | 17272.76 |
| `NER_LOSS` | 82090.19 |
### Training
For progression in loss and performance on the dev set during training, please refer to the Weights and Biases run, [HERE](https://wandb.ai/emil-tj/dacy-an-efficient-pipeline-for-danish/runs/eg0wvam7?workspace=user-emil-tj)