alanakbik commited on
Commit
b901b88
1 Parent(s): cf27b9e

initial model commit

Browse files
Files changed (1) hide show
  1. README.md +176 -0
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: de
7
+ datasets:
8
+ - legal
9
+ inference: false
10
+ ---
11
+
12
+ ## NER for German Legal Text in Flair (default model)
13
+
14
+ This is the legal NER model for German that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **96,35** (CoNLL-03 German revised)
17
+
18
+ Predicts 19 tags:
19
+
20
+ | **tag** | **meaning** |
21
+ |---------------------------------|-----------|
22
+ | AN | Anwalt |
23
+ | EUN | Europäische Norm |
24
+ | GS | Gesetz |
25
+ | GRT | Gericht |
26
+ | INN | Institution |
27
+ | LD | Land |
28
+ | LDS | Landschaft |
29
+ | LIT | Literatur |
30
+ | MRK | Marke |
31
+ | ORG | Organisation |
32
+ | PER | Person |
33
+ | RR | Richter |
34
+ | RS | Rechtssprechung |
35
+ | ST | Stadt |
36
+ | STR | Straße |
37
+ | UN | Unternehmen |
38
+ | VO | Verordnung |
39
+ | VS | Vorschrift |
40
+ | VT | Vertrag |
41
+
42
+ Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
43
+
44
+ More details on the Legal NER dataset [here](https://github.com/elenanereiss/Legal-Entity-Recognition)
45
+
46
+ ---
47
+
48
+ ### Demo: How to use in Flair
49
+
50
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
51
+
52
+ ```python
53
+ from flair.data import Sentence
54
+ from flair.models import SequenceTagger
55
+
56
+ # load tagger
57
+ tagger = SequenceTagger.load("flair/ner-german-legal")
58
+
59
+ # make example sentence (don't use tokenizer since Rechtstexte are badly handled)
60
+ sentence = Sentence("Herr W. verstieß gegen § 36 Abs. 7 IfSG.", use_tokenizer=False)
61
+
62
+
63
+ # predict NER tags
64
+ tagger.predict(sentence)
65
+
66
+ # print sentence
67
+ print(sentence)
68
+
69
+ # print predicted NER spans
70
+ print('The following NER tags are found:')
71
+ # iterate over entities and print
72
+ for entity in sentence.get_spans('ner'):
73
+ print(entity)
74
+
75
+ ```
76
+
77
+ This yields the following output:
78
+ ```
79
+ Span [2]: "W." [− Labels: PER (0.9911)]
80
+ Span [5,6,7,8,9]: "§ 36 Abs. 7 IfSG." [− Labels: GS (0.5353)]
81
+
82
+ ```
83
+
84
+ So, the entities "*W.*" (labeled as a **person**) and "*§ 36 Abs. 7 IfSG*" (labeled as a **Gesetz**) are found in the sentence "*Herr W. verstieß gegen § 36 Abs. 7 IfSG.*".
85
+
86
+
87
+ ---
88
+
89
+ ### Training: Script to train this model
90
+
91
+ The following Flair script was used to train this model:
92
+
93
+ ```python
94
+ from flair.data import Corpus
95
+ from flair.datasets import LER_GERMAN
96
+ from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
97
+
98
+ # 1. get the corpus
99
+ corpus: Corpus = LER_GERMAN()
100
+
101
+ # 2. what tag do we want to predict?
102
+ tag_type = 'ner'
103
+
104
+ # 3. make the tag dictionary from the corpus
105
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
106
+
107
+ # 4. initialize each embedding we use
108
+ embedding_types = [
109
+
110
+ # GloVe embeddings
111
+ WordEmbeddings('de'),
112
+
113
+ # contextual string embeddings, forward
114
+ FlairEmbeddings('de-forward'),
115
+
116
+ # contextual string embeddings, backward
117
+ FlairEmbeddings('de-backward'),
118
+ ]
119
+
120
+ # embedding stack consists of Flair and GloVe embeddings
121
+ embeddings = StackedEmbeddings(embeddings=embedding_types)
122
+
123
+ # 5. initialize sequence tagger
124
+ from flair.models import SequenceTagger
125
+
126
+ tagger = SequenceTagger(hidden_size=256,
127
+ embeddings=embeddings,
128
+ tag_dictionary=tag_dictionary,
129
+ tag_type=tag_type)
130
+
131
+ # 6. initialize trainer
132
+ from flair.trainers import ModelTrainer
133
+
134
+ trainer = ModelTrainer(tagger, corpus)
135
+
136
+ # 7. run training
137
+ trainer.train('resources/taggers/ner-german-legal',
138
+ train_with_dev=True,
139
+ max_epochs=150)
140
+ ```
141
+
142
+
143
+
144
+ ---
145
+
146
+ ### Cite
147
+
148
+ Please cite the following papers when using this model.
149
+
150
+ ```
151
+ @inproceedings{leitner2019fine,
152
+ author = {Elena Leitner and Georg Rehm and Julian Moreno-Schneider},
153
+ title = {{Fine-grained Named Entity Recognition in Legal Documents}},
154
+ booktitle = {Semantic Systems. The Power of AI and Knowledge
155
+ Graphs. Proceedings of the 15th International Conference
156
+ (SEMANTiCS 2019)},
157
+ year = 2019,
158
+ pages = {272--287},
159
+ pdf = {https://link.springer.com/content/pdf/10.1007%2F978-3-030-33220-4_20.pdf}}
160
+ ```
161
+
162
+ ```
163
+ @inproceedings{akbik2018coling,
164
+ title={Contextual String Embeddings for Sequence Labeling},
165
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
166
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
167
+ pages = {1638--1649},
168
+ year = {2018}
169
+ }
170
+ ```
171
+
172
+ ---
173
+
174
+ ### Issues?
175
+
176
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).