Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/bert-base-german-finetuned-ler/README.md
README.md
ADDED
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: de
|
3 |
+
---
|
4 |
+
|
5 |
+
# German BERT + LER (Legal Entity Recognition) ⚖️
|
6 |
+
|
7 |
+
German BERT ([BERT-base-german-cased](https://huggingface.co/bert-base-german-cased)) fine-tuned on [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset for **LER** (NER) downstream task.
|
8 |
+
|
9 |
+
## Details of the downstream task (NER) - Dataset
|
10 |
+
|
11 |
+
[Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition): Fine-grained Named Entity Recognition in Legal Documents.
|
12 |
+
|
13 |
+
Court decisions from 2017 and 2018 were selected for the dataset, published online by the [Federal Ministry of Justice and Consumer Protection](http://www.rechtsprechung-im-internet.de). The documents originate from seven federal courts: Federal Labour Court (BAG), Federal Fiscal Court (BFH), Federal Court of Justice (BGH), Federal Patent Court (BPatG), Federal Social Court (BSG), Federal Constitutional Court (BVerfG) and Federal Administrative Court (BVerwG).
|
14 |
+
|
15 |
+
|
16 |
+
| Split | # Samples |
|
17 |
+
| ---------------------- | ----- |
|
18 |
+
| Train | 1657048 |
|
19 |
+
| Eval | 500000 |
|
20 |
+
|
21 |
+
- Training script: [Fine-tuning script for NER provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
|
22 |
+
Colab: [How to fine-tune a model for NER using HF scripts](https://colab.research.google.com/drive/156Qrd7NsUHwA3nmQ6gXdZY0NzOvqk9AT?usp=sharing)
|
23 |
+
|
24 |
+
- Labels covered (and its distribution):
|
25 |
+
|
26 |
+
```
|
27 |
+
107 B-AN
|
28 |
+
918 B-EUN
|
29 |
+
2238 B-GRT
|
30 |
+
13282 B-GS
|
31 |
+
1113 B-INN
|
32 |
+
704 B-LD
|
33 |
+
151 B-LDS
|
34 |
+
2490 B-LIT
|
35 |
+
282 B-MRK
|
36 |
+
890 B-ORG
|
37 |
+
1374 B-PER
|
38 |
+
1480 B-RR
|
39 |
+
10046 B-RS
|
40 |
+
401 B-ST
|
41 |
+
68 B-STR
|
42 |
+
1011 B-UN
|
43 |
+
282 B-VO
|
44 |
+
391 B-VS
|
45 |
+
2648 B-VT
|
46 |
+
46 I-AN
|
47 |
+
6925 I-EUN
|
48 |
+
1957 I-GRT
|
49 |
+
70257 I-GS
|
50 |
+
2931 I-INN
|
51 |
+
153 I-LD
|
52 |
+
26 I-LDS
|
53 |
+
28881 I-LIT
|
54 |
+
383 I-MRK
|
55 |
+
1185 I-ORG
|
56 |
+
330 I-PER
|
57 |
+
106 I-RR
|
58 |
+
138938 I-RS
|
59 |
+
34 I-ST
|
60 |
+
55 I-STR
|
61 |
+
1259 I-UN
|
62 |
+
1572 I-VO
|
63 |
+
2488 I-VS
|
64 |
+
11121 I-VT
|
65 |
+
1348525 O
|
66 |
+
```
|
67 |
+
- [Annotation Guidelines (German)](https://github.com/elenanereiss/Legal-Entity-Recognition/blob/master/docs/Annotationsrichtlinien.pdf)
|
68 |
+
|
69 |
+
|
70 |
+
## Metrics on evaluation set
|
71 |
+
|
72 |
+
| Metric | # score |
|
73 |
+
| :------------------------------------------------------------------------------------: | :-------: |
|
74 |
+
| F1 | **85.67**
|
75 |
+
| Precision | **84.35** |
|
76 |
+
| Recall | **87.04** |
|
77 |
+
| Accuracy | **98.46** |
|
78 |
+
|
79 |
+
## Model in action
|
80 |
+
|
81 |
+
Fast usage with **pipelines**:
|
82 |
+
|
83 |
+
```python
|
84 |
+
from transformers import pipeline
|
85 |
+
|
86 |
+
nlp_ler = pipeline(
|
87 |
+
"ner",
|
88 |
+
model="mrm8488/bert-base-german-finetuned-ler",
|
89 |
+
tokenizer="mrm8488/bert-base-german-finetuned-ler"
|
90 |
+
)
|
91 |
+
|
92 |
+
text = "Your German legal text here"
|
93 |
+
|
94 |
+
nlp_ler(text)
|
95 |
+
|
96 |
+
```
|
97 |
+
|
98 |
+
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
|
99 |
+
|
100 |
+
> Made with <span style="color: #e25555;">♥</span> in Spain
|