DaCy_large_DANSK_ner

DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analyzing Danish pipelines. At the time of publishing this model, also included in DaCy encorporates the only models for fine-grained NER using DANSK dataset - a dataset containing 18 annotation types in the same format as Ontonotes. Moreover, DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency parsing for Danish on the DaNE dataset. Check out the DaCy repository for material on how to use DaCy and reproduce the results. DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.

Feature	Description
Name	`da_dacy_large_DANSK_ner`
Version	`0.1.0`
spaCy	`>=3.5.0,<3.6.0`
Default Pipeline	`transformer`, `ner`
Components	`transformer`, `ner`
Vectors	0 keys, 0 unique vectors (0 dimensions)
Sources	DANSK - Danish Annotations for NLP Specific TasKs KennethEnevoldsen/dfm-bert-large-v1-2048bsz-1Msteps (Kenneth Enevoldsen)
License	`apache-2.0`
Author	Centre for Humanities Computing Aarhus

Label Scheme

View label scheme (18 labels for 1 components)

Component	Labels
`ner`	`CARDINAL`, `DATE`, `EVENT`, `FACILITY`, `GPE`, `LANGUAGE`, `LAW`, `LOCATION`, `MONEY`, `NORP`, `ORDINAL`, `ORGANIZATION`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK OF ART`

Accuracy

Type	Score
`ENTS_F`	81.51
`ENTS_P`	81.00
`ENTS_R`	82.03
`TRANSFORMER_LOSS`	63375.61
`NER_LOSS`	158164.20

Performance tables

The table below shows the F1, recall and precision of the three DaCy fine-grained models.

Score	DaCy large	DaCy medium	DaCy small
F1	0.823	0.806	0.776
Recall	0.834	0.818	0.77
Precision	0.813	0.794	0.781

The table below shows the F1 of the three DaCy fine-grained models within each named entity type.

Named-entity type	DaCy large	DaCy medium	DaCy small
CARDINAL	0.874	0.781	0.887
DATE	0.846	0.859	0.867
EVENT	0.611	0.571	0.4
FACILITY	0.545	0.533	0.468
GPE	0.893	0.838	0.794
LANGUAGE	0.902	0.486	0.194
LAW	0.686	0.625	0.606
LOCATION	0.633	0.737	0.581
MONEY	0.993	1	0.947
NORP	0.78	0.887	0.785
ORDINAL	0.696	0.7	0.727
ORGANIZATION	0.863	0.851	0.781
PERCENT	0.923	0.96	0.96
PERSON	0.871	0.872	0.833
PRODUCT	0.671	0.635	0.526
QUANTITY	0.386	0.654	0.708
TIME	0.643	0.571	0.71
WORK OF ART	0.494	0.639	0.488

The table below shows the F1 of the three DaCy fine-grained models within each domain of texts in DANSK.

Domain	DaCy large	DaCy medium	DaCy small
All domains combined	0.823	0.806	0.776
Conversation	0.796	0.718	0.82
Dannet	0.75	0.667	1
Legal	0.852	0.854	0.866
News	0.841	0.759	0.86
Social Media	0.793	0.847	0.8
Web	0.826	0.802	0.756
Wiki and Books	0.778	0.838	0.709

emiltj
/

da_dacy_large_DANSK_ner

DaCy_large_DANSK_ner

Label Scheme

Accuracy

Performance tables

Evaluation results