julien-c's picture
julien-c HF staff
Migrate model card from transformers-repo
f896581
---
language: de
---
# German BERT + LER (Legal Entity Recognition) ⚖️
German BERT ([BERT-base-german-cased](https://huggingface.co/bert-base-german-cased)) fine-tuned on [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset for **LER** (NER) downstream task.
## Details of the downstream task (NER) - Dataset
[Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition): Fine-grained Named Entity Recognition in Legal Documents.
Court decisions from 2017 and 2018 were selected for the dataset, published online by the [Federal Ministry of Justice and Consumer Protection](http://www.rechtsprechung-im-internet.de). The documents originate from seven federal courts: Federal Labour Court (BAG), Federal Fiscal Court (BFH), Federal Court of Justice (BGH), Federal Patent Court (BPatG), Federal Social Court (BSG), Federal Constitutional Court (BVerfG) and Federal Administrative Court (BVerwG).
| Split | # Samples |
| ---------------------- | ----- |
| Train | 1657048 |
| Eval | 500000 |
- Training script: [Fine-tuning script for NER provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
Colab: [How to fine-tune a model for NER using HF scripts](https://colab.research.google.com/drive/156Qrd7NsUHwA3nmQ6gXdZY0NzOvqk9AT?usp=sharing)
- Labels covered (and its distribution):
```
107 B-AN
918 B-EUN
2238 B-GRT
13282 B-GS
1113 B-INN
704 B-LD
151 B-LDS
2490 B-LIT
282 B-MRK
890 B-ORG
1374 B-PER
1480 B-RR
10046 B-RS
401 B-ST
68 B-STR
1011 B-UN
282 B-VO
391 B-VS
2648 B-VT
46 I-AN
6925 I-EUN
1957 I-GRT
70257 I-GS
2931 I-INN
153 I-LD
26 I-LDS
28881 I-LIT
383 I-MRK
1185 I-ORG
330 I-PER
106 I-RR
138938 I-RS
34 I-ST
55 I-STR
1259 I-UN
1572 I-VO
2488 I-VS
11121 I-VT
1348525 O
```
- [Annotation Guidelines (German)](https://github.com/elenanereiss/Legal-Entity-Recognition/blob/master/docs/Annotationsrichtlinien.pdf)
## Metrics on evaluation set
| Metric | # score |
| :------------------------------------------------------------------------------------: | :-------: |
| F1 | **85.67**
| Precision | **84.35** |
| Recall | **87.04** |
| Accuracy | **98.46** |
## Model in action
Fast usage with **pipelines**:
```python
from transformers import pipeline
nlp_ler = pipeline(
"ner",
model="mrm8488/bert-base-german-finetuned-ler",
tokenizer="mrm8488/bert-base-german-finetuned-ler"
)
text = "Your German legal text here"
nlp_ler(text)
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain