ner-danish / README.md
1 ---
2 tags:
3 - flair
4 - token-classification
5 - sequence-tagger-model
6 language: da
7 datasets:
8 - DaNE
9 widget:
10 - text: "Jens Peter Hansen kommer fra Danmark"
11 ---
12
13 # Danish NER in Flair (default model)
14
15 This is the standard 4-class NER model for Danish that ships with [Flair](https://github.com/flairNLP/flair/).
16
17 F1-Score: **81.78** (DaNER)
18
19 Predicts 4 tags:
20
21 | **tag** | **meaning** |
22 |---------------------------------|-----------|
23 | PER | person name |
24 | LOC | location name |
25 | ORG | organization name |
26 | MISC | other name |
27
28 Based on Transformer embeddings and LSTM-CRF.
29
30 ---
31 # Demo: How to use in Flair
32
33 Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
34
35 ```python
36 from flair.data import Sentence
37 from flair.models import SequenceTagger
38
39 # load tagger
40 tagger = SequenceTagger.load("flair/ner-danish")
41
42 # make example sentence
43 sentence = Sentence("Jens Peter Hansen kommer fra Danmark")
44
45 # predict NER tags
46 tagger.predict(sentence)
47
48 # print sentence
49 print(sentence)
50
51 # print predicted NER spans
52 print('The following NER tags are found:')
53 # iterate over entities and print
54 for entity in sentence.get_spans('ner'):
55 print(entity)
56
57 ```
58
59 This yields the following output:
60 ```
61 Span [1,2,3]: "Jens Peter Hansen" [− Labels: PER (0.9961)]
62 Span [6]: "Danmark" [− Labels: LOC (0.9816)]
63 ```
64
65 So, the entities "*Jens Peter Hansen*" (labeled as a **person**) and "*Danmark*" (labeled as a **location**) are found in the sentence "*Jens Peter Hansen kommer fra Danmark*".
66
67
68 ---
69
70 ### Training: Script to train this model
71
72 The model was trained by the [DaNLP project](https://github.com/alexandrainst/danlp) using the [DaNE corpus](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#danish-dependency-treebank-dane-dane). Check their repo for more information.
73
74 The following Flair script may be used to train such a model:
75
76 ```python
77 from flair.data import Corpus
78 from flair.datasets import DANE
79 from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
80
81 # 1. get the corpus
82 corpus: Corpus = DANE()
83
84 # 2. what tag do we want to predict?
85 tag_type = 'ner'
86
87 # 3. make the tag dictionary from the corpus
88 tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
89
90 # 4. initialize each embedding we use
91 embedding_types = [
92
93 # GloVe embeddings
94 WordEmbeddings('da'),
95
96 # contextual string embeddings, forward
97 FlairEmbeddings('da-forward'),
98
99 # contextual string embeddings, backward
100 FlairEmbeddings('da-backward'),
101 ]
102
103 # embedding stack consists of Flair and GloVe embeddings
104 embeddings = StackedEmbeddings(embeddings=embedding_types)
105
106 # 5. initialize sequence tagger
107 from flair.models import SequenceTagger
108
109 tagger = SequenceTagger(hidden_size=256,
110 embeddings=embeddings,
111 tag_dictionary=tag_dictionary,
112 tag_type=tag_type)
113
114 # 6. initialize trainer
115 from flair.trainers import ModelTrainer
116
117 trainer = ModelTrainer(tagger, corpus)
118
119 # 7. run training
120 trainer.train('resources/taggers/ner-danish',
121 train_with_dev=True,
122 max_epochs=150)
123 ```
124
125
126 ---
127
128 ### Cite
129
130 Please cite the following papers when using this model.
131
132 ```
133 @inproceedings{akbik-etal-2019-flair,
134 title = "{FLAIR}: An Easy-to-Use Framework for State-of-the-Art {NLP}",
135 author = "Akbik, Alan and
136 Bergmann, Tanja and
137 Blythe, Duncan and
138 Rasul, Kashif and
139 Schweter, Stefan and
140 Vollgraf, Roland",
141 booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (Demonstrations)",
142 year = "2019",
143 url = "https://www.aclweb.org/anthology/N19-4010",
144 pages = "54--59",
145 }
146 ```
147
148 And check the [DaNLP project](https://github.com/alexandrainst/danlp) for more information.
149
150 ---
151
152 ### Issues?
153
154 The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
155