alanakbik commited on
Commit
decd357
1 Parent(s): a205601
Files changed (1) hide show
  1. README.md +153 -0
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: da
7
+ datasets:
8
+ - DaNE
9
+ inference: false
10
+ ---
11
+
12
+ # Danish NER in Flair (default model)
13
+
14
+ This is the standard 4-class NER model for Danish that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **81.78** (DaNER)
17
+
18
+ Predicts 4 tags:
19
+
20
+ | **tag** | **meaning** |
21
+ |---------------------------------|-----------|
22
+ | PER | person name |
23
+ | LOC | location name |
24
+ | ORG | organization name |
25
+ | MISC | other name |
26
+
27
+ Based on Transformer embeddings and LSTM-CRF.
28
+
29
+ ---
30
+ # Demo: How to use in Flair
31
+
32
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
33
+
34
+ ```python
35
+ from flair.data import Sentence
36
+ from flair.models import SequenceTagger
37
+
38
+ # load tagger
39
+ tagger = SequenceTagger.load("flair/ner-danish")
40
+
41
+ # make example sentence
42
+ sentence = Sentence("Jens Peter Hansen kommer fra Danmark")
43
+
44
+ # predict NER tags
45
+ tagger.predict(sentence)
46
+
47
+ # print sentence
48
+ print(sentence)
49
+
50
+ # print predicted NER spans
51
+ print('The following NER tags are found:')
52
+ # iterate over entities and print
53
+ for entity in sentence.get_spans('ner'):
54
+ print(entity)
55
+
56
+ ```
57
+
58
+ This yields the following output:
59
+ ```
60
+ Span [1,2,3]: "Jens Peter Hansen" [− Labels: PER (0.9961)]
61
+ Span [6]: "Danmark" [− Labels: LOC (0.9816)]
62
+ ```
63
+
64
+ So, the entities "*Jens Peter Hansen*" (labeled as a **person**) and "*Danmark*" (labeled as a **location**) are found in the sentence "*Jens Peter Hansen kommer fra Danmark*".
65
+
66
+
67
+ ---
68
+
69
+ ### Training: Script to train this model
70
+
71
+ The model was trained by the [DaNLP project](https://github.com/alexandrainst/danlp) using the [DaNE corpus](https://github.com/alexandrainst/danlp/blob/master/docs/docs/datasets.md#danish-dependency-treebank-dane-dane). Check their repo for more information.
72
+
73
+ The following Flair script may be used to train such a model:
74
+
75
+ ```python
76
+ from flair.data import Corpus
77
+ from flair.datasets import DANE
78
+ from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
79
+
80
+ # 1. get the corpus
81
+ corpus: Corpus = DANE()
82
+
83
+ # 2. what tag do we want to predict?
84
+ tag_type = 'ner'
85
+
86
+ # 3. make the tag dictionary from the corpus
87
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
88
+
89
+ # 4. initialize each embedding we use
90
+ embedding_types = [
91
+
92
+ # GloVe embeddings
93
+ WordEmbeddings('da'),
94
+
95
+ # contextual string embeddings, forward
96
+ FlairEmbeddings('da-forward'),
97
+
98
+ # contextual string embeddings, backward
99
+ FlairEmbeddings('da-backward'),
100
+ ]
101
+
102
+ # embedding stack consists of Flair and GloVe embeddings
103
+ embeddings = StackedEmbeddings(embeddings=embedding_types)
104
+
105
+ # 5. initialize sequence tagger
106
+ from flair.models import SequenceTagger
107
+
108
+ tagger = SequenceTagger(hidden_size=256,
109
+ embeddings=embeddings,
110
+ tag_dictionary=tag_dictionary,
111
+ tag_type=tag_type)
112
+
113
+ # 6. initialize trainer
114
+ from flair.trainers import ModelTrainer
115
+
116
+ trainer = ModelTrainer(tagger, corpus)
117
+
118
+ # 7. run training
119
+ trainer.train('resources/taggers/ner-danish',
120
+ train_with_dev=True,
121
+ max_epochs=150)
122
+ ```
123
+
124
+
125
+ ---
126
+
127
+ ### Cite
128
+
129
+ Please cite the following papers when using this model.
130
+
131
+ ```
132
+ @inproceedings{akbik-etal-2019-flair,
133
+ title = "{FLAIR}: An Easy-to-Use Framework for State-of-the-Art {NLP}",
134
+ author = "Akbik, Alan and
135
+ Bergmann, Tanja and
136
+ Blythe, Duncan and
137
+ Rasul, Kashif and
138
+ Schweter, Stefan and
139
+ Vollgraf, Roland",
140
+ booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics (Demonstrations)",
141
+ year = "2019",
142
+ url = "https://www.aclweb.org/anthology/N19-4010",
143
+ pages = "54--59",
144
+ }
145
+ ```
146
+
147
+ And check the [DaNLP project](https://github.com/alexandrainst/danlp) for more information.
148
+
149
+ ---
150
+
151
+ ### Issues?
152
+
153
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).