dominguesm commited on
Commit
4a5a028
1 Parent(s): 8deee31
README.md CHANGED
@@ -1,3 +1,86 @@
1
  ---
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ thumbnail: "https://i.imgur.com/tpI1iT5.jpg"
3
  license: apache-2.0
4
+ language:
5
+ - pt
6
+ tags:
7
+ - generated_from_trainer
8
+ metrics:
9
+ - precision
10
+ - recall
11
+ - f1
12
+ - accuracy
13
+ model-index:
14
+ - name: checkpoints
15
+ results:
16
+ - task:
17
+ name: Token Classification
18
+ type: token-classification
19
+ metrics:
20
+ - name: F1
21
+ type: f1
22
+ value: 0.9525622169191057
23
+ - name: Precision
24
+ type: precision
25
+ value: 0.9438680702115613
26
+ - name: Recall
27
+ type: recall
28
+ value: 0.961418019517758
29
+ - name: Accuracy
30
+ type: accuracy
31
+ value: 0.9894253721279602
32
+ - name: Loss
33
+ type: loss
34
+ value: 0.030161771923303604
35
+ widget:
36
+ - text: "Ao Instituto Médico Legal da jurisdição do acidente ou da residência cumpre fornecer, no prazo de 90 dias, laudo à vítima (art. 5, § 5, Lei n. 6.194/74 de 19 de dezembro de 1974), função técnica que pode ser suprida por prova pericial realizada por ordem do juízo da causa, ou por prova técnica realizada no âmbito administrativo que se mostre coerente com os demais elementos de prova constante dos autos."
37
+ - text: "Acrescento que não há de se falar em violação do artigo 114, § 3º, da Constituição Federal, posto que referido dispositivo revela-se impertinente, tratando da possibilidade de ajuizamento de dissídio coletivo pelo Ministério Público do Trabalho nos casos de greve em atividade essencial."
38
+ - text: "Dispõe sobre o estágio de estudantes; altera a redação do art. 428 da Consolidação das Leis do Trabalho – CLT, aprovada pelo Decreto-Lei no 5.452, de 1o de maio de 1943, e a Lei no 9.394, de 20 de dezembro de 1996; revoga as Leis nos 6.494, de 7 de dezembro de 1977, e 8.859, de 23 de março de 1994, o parágrafo único do art. 82 da Lei no 9.394, de 20 de dezembro de 1996, e o art. 6o da Medida Provisória no 2.164-41, de 24 de agosto de 2001; e dá outras providências."
39
  ---
40
+
41
+ ## (BERT base) NER model in the legal domain in Portuguese
42
+
43
+ **README under construction**
44
+
45
+ **ner-legal-bert-base-cased-ptbr** is a NER model (token classification) in the legal domain in Portuguese that was finetuned from the model [dominguesm/legal-bert-base-cased-ptbr](https://huggingface.co/dominguesm/legal-bert-base-cased-ptbr) by using a NER objective.
46
+
47
+ The model is intended to assist NLP research in the legal field, computer law and legal technology applications. Several legal texts in Portuguese were used (more information below).
48
+
49
+
50
+ ## Training procedure
51
+
52
+ ### Training results
53
+
54
+ ```
55
+ Num examples = 971932
56
+ Num Epochs = 3
57
+ Instantaneous batch size per device = 64
58
+ Total train batch size (w. parallel, distributed & accumulation) = 128
59
+ Gradient Accumulation steps = 2
60
+ Total optimization steps = 22779
61
+ ```
62
+
63
+ | Step | Training Loss | Validation Loss | Precision | Recall | F1 Accuracy |
64
+ | ---- | ------------- | --------------- | --------- | ------ | ----------- |
65
+ |1000 |0.113900 | 0.057008 | 0.898600| 0.938444| 0.918090| 0.980961|
66
+ |2000 |0.052800 | 0.048254 | 0.917243 | 0.941188 | 0.929062 | 0.983854|
67
+ |3000 |0.046200 | 0.043833 | 0.919706 | 0.948411 | 0.933838 | 0.984931|
68
+ |4000 |0.043500 | 0.039796 | 0.928439 | 0.947058 | 0.937656 | 0.985891|
69
+ |5000 |0.041400 | 0.039421 | 0.926103 | 0.952857 | 0.939290 | 0.986130|
70
+ |6000 |0.039700 | 0.038599 | 0.922376 | 0.956257 | 0.939011 | 0.986093|
71
+ |7000 |0.037800 | 0.036463 | 0.935125 | 0.950937 | 0.942964 | 0.987030|
72
+ |8000 |0.035900 | 0.035706 | 0.934638 | 0.954147 | 0.944292 | 0.987433|
73
+ |9000 |0.033800 | 0.034518 | 0.940354 | 0.951991 | 0.946136 | 0.987866|
74
+ |10000 |0.033600 | 0.033454 | 0.938170 | 0.956097 | 0.947049 | 0.988066|
75
+ |11000 |0.032700 | 0.032899 | 0.934130 | 0.959491 | 0.946641 | 0.988092|
76
+ |12000 |0.032200 | 0.032477 | 0.937400 | 0.959150 | 0.948151 | 0.988305|
77
+ |13000 |0.031200 | 0.033207 | 0.937058 | 0.960506 | 0.948637 | 0.988340|
78
+ |14000 |0.031400 | 0.031711 | 0.938765 | 0.959711 | 0.949123 | 0.988635|
79
+ |15000 |0.030600 | 0.031519 | 0.940488 | 0.959413 | 0.949856 | 0.988709|
80
+ |16000 |0.028500 | 0.031618 | 0.943643 | 0.957693 | 0.950616 | |0.988891|
81
+ |17000 |0.028000 | 0.031106 | 0.941109 | 0.960687 | 0.950797 | |0.989016|
82
+ |18000 |0.027800 | 0.030712 | 0.942821 | 0.960528 | 0.951592 | |0.989198|
83
+ |19000 |0.027500 | 0.030523 | 0.942950 | 0.960947 | 0.951864 | |0.989348|
84
+ |20000 |0.027400 | 0.030577 | 0.942462 | 0.961754 | 0.952010 | |0.989295|
85
+ |21000 |0.027000 | 0.030025 | 0.944483 | 0.960497 | 0.952422 | |0.989445|
86
+ |22000 |0.026800 | 0.030162 | 0.943868 | 0.961418 | 0.952562 | |0.989425|
config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./model/",
3
+ "architectures": [
4
+ "BertForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "directionality": "bidi",
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "O",
14
+ "1": "B-ORGANIZACAO",
15
+ "10": "I-LEGISLACAO",
16
+ "11": "B-JURISPRUDENCIA",
17
+ "12": "I-JURISPRUDENCIA",
18
+ "2": "I-ORGANIZACAO",
19
+ "3": "B-PESSOA",
20
+ "4": "I-PESSOA",
21
+ "5": "B-TEMPO",
22
+ "6": "I-TEMPO",
23
+ "7": "B-LOCAL",
24
+ "8": "I-LOCAL",
25
+ "9": "B-LEGISLACAO"
26
+ },
27
+ "initializer_range": 0.02,
28
+ "intermediate_size": 3072,
29
+ "label2id": {
30
+ "B-JURISPRUDENCIA": "11",
31
+ "B-LEGISLACAO": "9",
32
+ "B-LOCAL": "7",
33
+ "B-ORGANIZACAO": "1",
34
+ "B-PESSOA": "3",
35
+ "B-TEMPO": "5",
36
+ "I-JURISPRUDENCIA": "12",
37
+ "I-LEGISLACAO": "10",
38
+ "I-LOCAL": "8",
39
+ "I-ORGANIZACAO": "2",
40
+ "I-PESSOA": "4",
41
+ "I-TEMPO": "6",
42
+ "O": "0"
43
+ },
44
+ "layer_norm_eps": 1e-12,
45
+ "max_position_embeddings": 512,
46
+ "model_type": "bert",
47
+ "num_attention_heads": 12,
48
+ "num_hidden_layers": 12,
49
+ "output_past": true,
50
+ "pad_token_id": 0,
51
+ "pooler_fc_size": 768,
52
+ "pooler_num_attention_heads": 12,
53
+ "pooler_num_fc_layers": 3,
54
+ "pooler_size_per_head": 128,
55
+ "pooler_type": "first_token_transform",
56
+ "position_embedding_type": "absolute",
57
+ "torch_dtype": "float32",
58
+ "transformers_version": "4.23.1",
59
+ "type_vocab_size": 2,
60
+ "use_cache": true,
61
+ "vocab_size": 29794
62
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb4492f08d56feeb44cfd11cb122ae37baf434e7fb3f874eb20d926d24047913
3
+ size 433437745
runs/e3_lr2e-05/1665668964.0356135/events.out.tfevents.1665668964.9dd23284ee7b.94.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40f5ca5dfdee0dba192b0ad07cef9d4fe609af62be8e358c614a6a97904d4b2e
3
+ size 5555
runs/e3_lr2e-05/1665669018.9886503/events.out.tfevents.1665669018.9dd23284ee7b.94.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96612585e659fead3c3eee346e4fccacb1ccedb3d6ae16969d23f2688013d434
3
+ size 5555
runs/e3_lr2e-05/1665669243.3965552/events.out.tfevents.1665669243.9dd23284ee7b.798.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12bac874c63fae5a1180928dc7326fef5d9ecadcc8b3a7c58ac064fa84af932e
3
+ size 5555
runs/e3_lr2e-05/1665669913.5734515/events.out.tfevents.1665669913.9dd23284ee7b.1054.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0fd5f2b7af02551df27f055f2f6570693f29c01b31d9dd1047d7e8f98949827
3
+ size 5555
runs/e3_lr2e-05/1665670039.231488/events.out.tfevents.1665670039.9dd23284ee7b.1181.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:677f44c73f115dd17405aa7ca09cd1bdddb634c6a5cd1db17c6532085fe2e1d6
3
+ size 5555
runs/e3_lr2e-05/events.out.tfevents.1665668964.9dd23284ee7b.94.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8787f6f136fd1aa6c77a6e1bc2f6dc31370b8c2c18684e14f85b295b27844288
3
+ size 4541
runs/e3_lr2e-05/events.out.tfevents.1665669018.9dd23284ee7b.94.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48d0a81b978e2f80a746c622dbc263dd638a24e532b189ca1526fa5956abaa0f
3
+ size 4543
runs/e3_lr2e-05/events.out.tfevents.1665669243.9dd23284ee7b.798.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e47d8c7372ff5632f40122d7cc1bed48fda6f0a3fa56fc2812b447f924eafc43
3
+ size 4700
runs/e3_lr2e-05/events.out.tfevents.1665669913.9dd23284ee7b.1054.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cec91f837947ea2ba8046e23ef1ad73415724d1edd74e79296650393f1ba98d9
3
+ size 4136
runs/e3_lr2e-05/events.out.tfevents.1665670039.9dd23284ee7b.1181.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec42a19101337f0ad0611b1e43ef82dab035c636904aafc79816229eb72322e0
3
+ size 18817
runs/e3_lr2e-05/events.out.tfevents.1665705195.9dd23284ee7b.1181.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63365388c6ee773a53699b6cb2e4d56c36e84130405c0742748a1919e7d2457e
3
+ size 521
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": false,
5
+ "mask_token": "[MASK]",
6
+ "name_or_path": "./model/",
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "special_tokens_map_file": "/root/.cache/huggingface/hub/models--neuralmind--bert-base-portuguese-cased/snapshots/94d69c95f98f7d5b2a8700c420230ae10def0baa/special_tokens_map.json",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:389017999f38b1c9ff0a840821632744cb86b8b27543e7d2a1eae02f5feee6a3
3
+ size 3439
vocab.txt ADDED
The diff for this file is too large to render. See raw diff