--- language: de --- # German BERT + LER (Legal Entity Recognition) ⚖️ German BERT ([BERT-base-german-cased](https://huggingface.co/bert-base-german-cased)) fine-tuned on [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset for **LER** (NER) downstream task. ## Details of the downstream task (NER) - Dataset [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition): Fine-grained Named Entity Recognition in Legal Documents. Court decisions from 2017 and 2018 were selected for the dataset, published online by the [Federal Ministry of Justice and Consumer Protection](http://www.rechtsprechung-im-internet.de). The documents originate from seven federal courts: Federal Labour Court (BAG), Federal Fiscal Court (BFH), Federal Court of Justice (BGH), Federal Patent Court (BPatG), Federal Social Court (BSG), Federal Constitutional Court (BVerfG) and Federal Administrative Court (BVerwG). | Split | # Samples | | ---------------------- | ----- | | Train | 1657048 | | Eval | 500000 | - Training script: [Fine-tuning script for NER provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py) Colab: [How to fine-tune a model for NER using HF scripts](https://colab.research.google.com/drive/156Qrd7NsUHwA3nmQ6gXdZY0NzOvqk9AT?usp=sharing) - Labels covered (and its distribution): ``` 107 B-AN 918 B-EUN 2238 B-GRT 13282 B-GS 1113 B-INN 704 B-LD 151 B-LDS 2490 B-LIT 282 B-MRK 890 B-ORG 1374 B-PER 1480 B-RR 10046 B-RS 401 B-ST 68 B-STR 1011 B-UN 282 B-VO 391 B-VS 2648 B-VT 46 I-AN 6925 I-EUN 1957 I-GRT 70257 I-GS 2931 I-INN 153 I-LD 26 I-LDS 28881 I-LIT 383 I-MRK 1185 I-ORG 330 I-PER 106 I-RR 138938 I-RS 34 I-ST 55 I-STR 1259 I-UN 1572 I-VO 2488 I-VS 11121 I-VT 1348525 O ``` - [Annotation Guidelines (German)](https://github.com/elenanereiss/Legal-Entity-Recognition/blob/master/docs/Annotationsrichtlinien.pdf) ## Metrics on evaluation set | Metric | # score | | :------------------------------------------------------------------------------------: | :-------: | | F1 | **85.67** | Precision | **84.35** | | Recall | **87.04** | | Accuracy | **98.46** | ## Model in action Fast usage with **pipelines**: ```python from transformers import pipeline nlp_ler = pipeline( "ner", model="mrm8488/bert-base-german-finetuned-ler", tokenizer="mrm8488/bert-base-german-finetuned-ler" ) text = "Your German legal text here" nlp_ler(text) ``` > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) > Made with in Spain