Migrate model card from transformers-repo

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/bert-base-german-finetuned-ler/README.md

Files changed (1) hide show

README.md +100 -0

README.md ADDED Viewed

	@@ -0,0 +1,100 @@

+---
+language: de
+---
+# German BERT + LER (Legal Entity Recognition) ⚖️
+German BERT ([BERT-base-german-cased](https://huggingface.co/bert-base-german-cased)) fine-tuned on [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset for **LER** (NER) downstream task.
+## Details of the downstream task (NER) - Dataset
+[Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition): Fine-grained Named Entity Recognition in Legal Documents.
+Court decisions from 2017 and 2018 were selected for the dataset, published online by the [Federal Ministry of Justice and Consumer Protection](http://www.rechtsprechung-im-internet.de). The documents originate from seven federal courts: Federal Labour Court (BAG), Federal Fiscal Court (BFH), Federal Court of Justice (BGH), Federal Patent Court (BPatG), Federal Social Court (BSG), Federal Constitutional Court (BVerfG) and Federal Administrative Court (BVerwG).
+|  Split             | # Samples |
+| ---------------------- | ----- |
+| Train                  | 1657048 |
+| Eval                    | 500000 |
+- Training script: [Fine-tuning script for NER provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
+Colab: [How to fine-tune a model for NER using HF scripts](https://colab.research.google.com/drive/156Qrd7NsUHwA3nmQ6gXdZY0NzOvqk9AT?usp=sharing)
+- Labels covered (and its distribution):
+```
+    107 B-AN
+    918 B-EUN
+   2238 B-GRT
+  13282 B-GS
+   1113 B-INN
+    704 B-LD
+    151 B-LDS
+   2490 B-LIT
+    282 B-MRK
+    890 B-ORG
+   1374 B-PER
+   1480 B-RR
+  10046 B-RS
+    401 B-ST
+     68 B-STR
+   1011 B-UN
+    282 B-VO
+    391 B-VS
+   2648 B-VT
+     46 I-AN
+   6925 I-EUN
+   1957 I-GRT
+  70257 I-GS
+   2931 I-INN
+    153 I-LD
+     26 I-LDS
+  28881 I-LIT
+    383 I-MRK
+   1185 I-ORG
+    330 I-PER
+    106 I-RR
+ 138938 I-RS
+     34 I-ST
+     55 I-STR
+   1259 I-UN
+   1572 I-VO
+   2488 I-VS
+  11121 I-VT
+1348525 O
+```
+- [Annotation Guidelines (German)](https://github.com/elenanereiss/Legal-Entity-Recognition/blob/master/docs/Annotationsrichtlinien.pdf)
+## Metrics on evaluation set
+|                                                      Metric                                                       |  # score  |
+| :------------------------------------------------------------------------------------: | :-------: |
+| F1                                       | **85.67**
+| Precision                                | **84.35** |
+| Recall                                   | **87.04** |
+| Accuracy                                 | **98.46** |
+## Model in action
+Fast usage with **pipelines**:
+```python
+from transformers import pipeline
+nlp_ler = pipeline(
+    "ner",
+    model="mrm8488/bert-base-german-finetuned-ler",
+    tokenizer="mrm8488/bert-base-german-finetuned-ler"
+)
+text = "Your German legal text here"
+nlp_ler(text)
+```
+> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
+> Made with <span style="color: #e25555;">&hearts;</span> in Spain