pierreguillou commited on
Commit
a949441
1 Parent(s): d2e9983

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - pt
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - pierreguillou/lener_br_finetuning_language_model
8
+ model-index:
9
+ - name: checkpoints
10
+ results:
11
+ - task:
12
+ name: Fill Mask
13
+ type: fill-mask
14
+ dataset:
15
+ name: pierreguillou/lener_br_finetuning_language_model
16
+ type: pierreguillou/lener_br_finetuning_language_model
17
+ metrics:
18
+ - name: Loss
19
+ type: loss
20
+ value: 1.127950
21
+ widget:
22
+ - text: "Com efeito, se tal fosse possível, o Poder [MASK] – que não dispõe de função legislativa – passaria a desempenhar atribuição que lhe é institucionalmente estranha (a de legislador positivo), usurpando, desse modo, no contexto de um sistema de poderes essencialmente limitados, competência que não lhe pertence, com evidente transgressão ao princípio constitucional da separação de poderes."
23
+ ---
24
+
25
+ ## (BERT large) Language modeling in the legal domain in Portuguese (LeNER-Br)
26
+
27
+ **bert-large-cased-pt-lenerbr** is a Language Model in the legal domain in Portuguese that was finetuned on 20/12/2021 in Google Colab from the model [BERTimbau large](https://huggingface.co/neuralmind/bert-large-portuguese-cased) on the dataset [LeNER-Br language modeling](https://huggingface.co/datasets/pierreguillou/lener_br_finetuning_language_model) by using a MASK objective.
28
+
29
+ ## Widget & APP
30
+
31
+ You can test this model into the widget of this page.
32
+
33
+ ## Using the model for inference in production
34
+ ````
35
+ # install pytorch: check https://pytorch.org/
36
+ # !pip install transformers
37
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
38
+
39
+ tokenizer = AutoTokenizer.from_pretrained("pierreguillou/bert-large-cased-pt-lenerbr")
40
+ model = AutoModelForMaskedLM.from_pretrained("pierreguillou/bert-large-cased-pt-lenerbr")
41
+ ````
42
+
43
+ ## Training procedure
44
+
45
+ ## Notebook
46
+
47
+ The notebook of finetuning ([Finetuning_language_model_BERtimbau_LeNER_Br.ipynb](https://github.com/piegu/language-models/blob/master/Finetuning_language_model_BERtimbau_LeNER_Br.ipynb)) is in github.
48
+
49
+ ### Training results
50
+
51
+ ````
52
+ Num examples = 3227
53
+ Num Epochs = 5
54
+ Instantaneous batch size per device = 2
55
+ Total train batch size (w. parallel, distributed & accumulation) = 8
56
+ Gradient Accumulation steps = 4
57
+ Total optimization steps = 2015
58
+
59
+ Step Training Loss Validation Loss
60
+ 100 1.616700 1.366015
61
+ 200 1.452000 1.312473
62
+ 300 1.431100 1.253055
63
+ 400 1.407500 1.264705
64
+ 500 1.301900 1.243277
65
+ 600 1.317800 1.233684
66
+ 700 1.319100 1.211826
67
+ 800 1.303800 1.190818
68
+ 900 1.262800 1.171898
69
+ 1000 1.235900 1.146275
70
+ 1100 1.221900 1.149027
71
+ 1200 1.226200 1.127950
72
+ 1300 1.201700 1.172729
73
+ 1400 1.198200 1.145363
74
+ ````