julien-c HF staff commited on
Commit
f896581
1 Parent(s): ff1ef90

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/mrm8488/bert-base-german-finetuned-ler/README.md

Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: de
3
+ ---
4
+
5
+ # German BERT + LER (Legal Entity Recognition) ⚖️
6
+
7
+ German BERT ([BERT-base-german-cased](https://huggingface.co/bert-base-german-cased)) fine-tuned on [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition) dataset for **LER** (NER) downstream task.
8
+
9
+ ## Details of the downstream task (NER) - Dataset
10
+
11
+ [Legal-Entity-Recognition](https://github.com/elenanereiss/Legal-Entity-Recognition): Fine-grained Named Entity Recognition in Legal Documents.
12
+
13
+ Court decisions from 2017 and 2018 were selected for the dataset, published online by the [Federal Ministry of Justice and Consumer Protection](http://www.rechtsprechung-im-internet.de). The documents originate from seven federal courts: Federal Labour Court (BAG), Federal Fiscal Court (BFH), Federal Court of Justice (BGH), Federal Patent Court (BPatG), Federal Social Court (BSG), Federal Constitutional Court (BVerfG) and Federal Administrative Court (BVerwG).
14
+
15
+
16
+ | Split | # Samples |
17
+ | ---------------------- | ----- |
18
+ | Train | 1657048 |
19
+ | Eval | 500000 |
20
+
21
+ - Training script: [Fine-tuning script for NER provided by Huggingface](https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_ner_old.py)
22
+ Colab: [How to fine-tune a model for NER using HF scripts](https://colab.research.google.com/drive/156Qrd7NsUHwA3nmQ6gXdZY0NzOvqk9AT?usp=sharing)
23
+
24
+ - Labels covered (and its distribution):
25
+
26
+ ```
27
+ 107 B-AN
28
+ 918 B-EUN
29
+ 2238 B-GRT
30
+ 13282 B-GS
31
+ 1113 B-INN
32
+ 704 B-LD
33
+ 151 B-LDS
34
+ 2490 B-LIT
35
+ 282 B-MRK
36
+ 890 B-ORG
37
+ 1374 B-PER
38
+ 1480 B-RR
39
+ 10046 B-RS
40
+ 401 B-ST
41
+ 68 B-STR
42
+ 1011 B-UN
43
+ 282 B-VO
44
+ 391 B-VS
45
+ 2648 B-VT
46
+ 46 I-AN
47
+ 6925 I-EUN
48
+ 1957 I-GRT
49
+ 70257 I-GS
50
+ 2931 I-INN
51
+ 153 I-LD
52
+ 26 I-LDS
53
+ 28881 I-LIT
54
+ 383 I-MRK
55
+ 1185 I-ORG
56
+ 330 I-PER
57
+ 106 I-RR
58
+ 138938 I-RS
59
+ 34 I-ST
60
+ 55 I-STR
61
+ 1259 I-UN
62
+ 1572 I-VO
63
+ 2488 I-VS
64
+ 11121 I-VT
65
+ 1348525 O
66
+ ```
67
+ - [Annotation Guidelines (German)](https://github.com/elenanereiss/Legal-Entity-Recognition/blob/master/docs/Annotationsrichtlinien.pdf)
68
+
69
+
70
+ ## Metrics on evaluation set
71
+
72
+ | Metric | # score |
73
+ | :------------------------------------------------------------------------------------: | :-------: |
74
+ | F1 | **85.67**
75
+ | Precision | **84.35** |
76
+ | Recall | **87.04** |
77
+ | Accuracy | **98.46** |
78
+
79
+ ## Model in action
80
+
81
+ Fast usage with **pipelines**:
82
+
83
+ ```python
84
+ from transformers import pipeline
85
+
86
+ nlp_ler = pipeline(
87
+ "ner",
88
+ model="mrm8488/bert-base-german-finetuned-ler",
89
+ tokenizer="mrm8488/bert-base-german-finetuned-ler"
90
+ )
91
+
92
+ text = "Your German legal text here"
93
+
94
+ nlp_ler(text)
95
+
96
+ ```
97
+
98
+ > Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
99
+
100
+ > Made with <span style="color: #e25555;">&hearts;</span> in Spain