pierreguillou commited on
Commit
770d60c
1 Parent(s): 8f55196

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -47
README.md CHANGED
@@ -22,19 +22,19 @@ model-index:
22
  metrics:
23
  - name: F1
24
  type: f1
25
- value: 0.8733423827921062
26
  - name: Precision
27
  type: precision
28
- value: 0.8487923685812868
29
  - name: Recall
30
  type: recall
31
- value: 0.8993548387096775
32
  - name: Accuracy
33
  type: accuracy
34
  value: 0.9759397808828684
35
  - name: Loss
36
  type: loss
37
- value: 0.10249536484479904
38
  widget:
39
  - text: "Ao Instituto Médico Legal da jurisdição do acidente ou da residência cumpre fornecer, no prazo de 90 dias, laudo à vítima (art. 5, § 5, Lei n. 6.194/74 de 19 de dezembro de 1974), função técnica que pode ser suprida por prova pericial realizada por ordem do juízo da causa, ou por prova técnica realizada no âmbito administrativo que se mostre coerente com os demais elementos de prova constante dos autos."
40
  - text: "Acrescento que não há de se falar em violação do artigo 114, § 3º, da Constituição Federal, posto que referido dispositivo revela-se impertinente, tratando da possibilidade de ajuizamento de dissídio coletivo pelo Ministério Público do Trabalho nos casos de greve em atividade essencial."
@@ -46,11 +46,11 @@ widget:
46
  **ner-bert-base-portuguese-cased-lenerbr** is a NER model (token classification) in the legal domain in Portuguese that was finetuned on 20/12/2021 in Google Colab from the model [pierreguillou/bert-base-cased-pt-lenerbr](https://huggingface.co/pierreguillou/bert-base-cased-pt-lenerbr) on the dataset [LeNER_br](https://huggingface.co/datasets/lener_br) by using a NER objective.
47
 
48
  Due to the small size of BERTimbau base and finetuning dataset, the model overfitted before to reach the end of training. Here are the overall final metrics on the validation dataset (*note: see the paragraph "Validation metrics by Named Entity" to get detailed metrics*):
49
- - **f1**: 0.8733423827921062
50
- - **precision**: 0.8487923685812868
51
- - **recall**: 0.8993548387096775
52
  - **accuracy**: 0.9759397808828684
53
- - **loss**: 0.10249536484479904
54
 
55
  Check as well the [large version of this model](https://huggingface.co/pierreguillou/ner-bert-large-cased-pt-lenerbr) with a f1 of 0.908.
56
 
@@ -117,20 +117,20 @@ The notebook of finetuning ([HuggingFace_Notebook_token_classification_NER_LeNER
117
  ### Hyperparameters
118
 
119
  #### batch, learning rate...
120
- - per_device_batch_size = 4
121
  - gradient_accumulation_steps = 2
122
  - learning_rate = 2e-5
123
- - num_train_epochs = 3
124
  - weight_decay = 0.01
125
  - optimizer = AdamW
126
  - betas = (0.9,0.999)
127
  - epsilon = 1e-08
128
  - lr_scheduler_type = linear
129
- - seed = 42
130
 
131
  #### save model & load best model
132
- - save_total_limit = 3
133
- - logging_steps = 290
134
  - eval_steps = logging_steps
135
  - evaluation_strategy = 'steps'
136
  - logging_strategy = 'steps'
@@ -147,53 +147,112 @@ The notebook of finetuning ([HuggingFace_Notebook_token_classification_NER_LeNER
147
 
148
  ````
149
  Num examples = 7828
150
- Num Epochs = 3
151
- Instantaneous batch size per device = 4
152
- Total train batch size (w. parallel, distributed & accumulation) = 8
153
  Gradient Accumulation steps = 2
154
- Total optimization steps = 2934
155
 
156
  Step Training Loss Validation Loss Precision Recall F1 Accuracy
157
- 290 0.314600 0.163042 0.735828 0.697849 0.716336 0.949198
158
- 580 0.086900 0.123495 0.779540 0.824301 0.801296 0.965807
159
- 870 0.072800 0.106785 0.798481 0.858925 0.827600 0.968626
160
- 1160 0.046300 0.109921 0.824576 0.877419 0.850177 0.973243
161
- 1450 0.036600 0.102495 0.848792 0.899355 0.873342 0.975940
162
- 1740 0.033400 0.121514 0.821681 0.899785 0.858961 0.967071
163
- 2030 0.034700 0.115568 0.846849 0.887097 0.866506 0.970607
164
- 2320 0.018000 0.108600 0.840258 0.895914 0.867194 0.973730
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  ````
166
 
167
  ### Validation metrics by Named Entity
168
  ````
169
  Num examples = 1177
170
 
171
- {'JURISPRUDENCIA': {'f1': 0.7069834413246942,
172
  'number': 657,
173
- 'precision': 0.6707650273224044,
174
- 'recall': 0.7473363774733638},
175
- 'LEGISLACAO': {'f1': 0.8256227758007118,
176
  'number': 571,
177
- 'precision': 0.8390596745027125,
178
- 'recall': 0.8126094570928196},
179
- 'LOCAL': {'f1': 0.7688564476885645,
180
  'number': 194,
181
- 'precision': 0.728110599078341,
182
- 'recall': 0.8144329896907216},
183
- 'ORGANIZACAO': {'f1': 0.8548387096774193,
184
  'number': 1340,
185
- 'precision': 0.8062169312169312,
186
- 'recall': 0.9097014925373135},
187
- 'PESSOA': {'f1': 0.9826697892271662,
188
  'number': 1072,
189
- 'precision': 0.9868297271872061,
190
- 'recall': 0.9785447761194029},
191
- 'TEMPO': {'f1': 0.9615846338535414,
192
  'number': 816,
193
- 'precision': 0.9423529411764706,
194
- 'recall': 0.9816176470588235},
195
- 'overall_accuracy': 0.9759397808828684,
196
- 'overall_f1': 0.8733423827921062,
197
- 'overall_precision': 0.8487923685812868,
198
- 'overall_recall': 0.8993548387096775}
199
  ````
 
22
  metrics:
23
  - name: F1
24
  type: f1
25
+ value: 0.8926146010186757
26
  - name: Precision
27
  type: precision
28
+ value: 0.8810222036028488
29
  - name: Recall
30
  type: recall
31
+ value: 0.9045161290322581
32
  - name: Accuracy
33
  type: accuracy
34
  value: 0.9759397808828684
35
  - name: Loss
36
  type: loss
37
+ value: 0.18803243339061737
38
  widget:
39
  - text: "Ao Instituto Médico Legal da jurisdição do acidente ou da residência cumpre fornecer, no prazo de 90 dias, laudo à vítima (art. 5, § 5, Lei n. 6.194/74 de 19 de dezembro de 1974), função técnica que pode ser suprida por prova pericial realizada por ordem do juízo da causa, ou por prova técnica realizada no âmbito administrativo que se mostre coerente com os demais elementos de prova constante dos autos."
40
  - text: "Acrescento que não há de se falar em violação do artigo 114, § 3º, da Constituição Federal, posto que referido dispositivo revela-se impertinente, tratando da possibilidade de ajuizamento de dissídio coletivo pelo Ministério Público do Trabalho nos casos de greve em atividade essencial."
 
46
  **ner-bert-base-portuguese-cased-lenerbr** is a NER model (token classification) in the legal domain in Portuguese that was finetuned on 20/12/2021 in Google Colab from the model [pierreguillou/bert-base-cased-pt-lenerbr](https://huggingface.co/pierreguillou/bert-base-cased-pt-lenerbr) on the dataset [LeNER_br](https://huggingface.co/datasets/lener_br) by using a NER objective.
47
 
48
  Due to the small size of BERTimbau base and finetuning dataset, the model overfitted before to reach the end of training. Here are the overall final metrics on the validation dataset (*note: see the paragraph "Validation metrics by Named Entity" to get detailed metrics*):
49
+ - **f1**: 0.8926146010186757
50
+ - **precision**: 0.8810222036028488
51
+ - **recall**: 0.9045161290322581
52
  - **accuracy**: 0.9759397808828684
53
+ - **loss**: 0.18803243339061737
54
 
55
  Check as well the [large version of this model](https://huggingface.co/pierreguillou/ner-bert-large-cased-pt-lenerbr) with a f1 of 0.908.
56
 
 
117
  ### Hyperparameters
118
 
119
  #### batch, learning rate...
120
+ - per_device_batch_size = 2
121
  - gradient_accumulation_steps = 2
122
  - learning_rate = 2e-5
123
+ - num_train_epochs = 10
124
  - weight_decay = 0.01
125
  - optimizer = AdamW
126
  - betas = (0.9,0.999)
127
  - epsilon = 1e-08
128
  - lr_scheduler_type = linear
129
+ - seed = 7
130
 
131
  #### save model & load best model
132
+ - save_total_limit = 2
133
+ - logging_steps = 300
134
  - eval_steps = logging_steps
135
  - evaluation_strategy = 'steps'
136
  - logging_strategy = 'steps'
 
147
 
148
  ````
149
  Num examples = 7828
150
+ Num Epochs = 10
151
+ Instantaneous batch size per device = 2
152
+ Total train batch size (w. parallel, distributed & accumulation) = 4
153
  Gradient Accumulation steps = 2
154
+ Total optimization steps = 19570
155
 
156
  Step Training Loss Validation Loss Precision Recall F1 Accuracy
157
+ 300 0.127600 0.178613 0.722909 0.741720 0.732194 0.948802
158
+ 600 0.088200 0.136965 0.733636 0.867742 0.795074 0.963079
159
+ 900 0.078000 0.128858 0.791912 0.838065 0.814335 0.965243
160
+ 1200 0.077800 0.126345 0.815400 0.865376 0.839645 0.967849
161
+ 1500 0.074100 0.148207 0.779274 0.895914 0.833533 0.960184
162
+ 1800 0.059500 0.116634 0.830829 0.868172 0.849090 0.969342
163
+ 2100 0.044500 0.208459 0.887150 0.816559 0.850392 0.960535
164
+ 2400 0.029400 0.136352 0.867821 0.851398 0.859531 0.970271
165
+ 2700 0.025000 0.165837 0.814881 0.878495 0.845493 0.961235
166
+ 3000 0.038400 0.120629 0.811719 0.893763 0.850768 0.971506
167
+ 3300 0.026200 0.175094 0.823435 0.882581 0.851983 0.962957
168
+ 3600 0.025600 0.178438 0.881095 0.886022 0.883551 0.963689
169
+ 3900 0.041000 0.134648 0.789035 0.916129 0.847846 0.967681
170
+ 4200 0.026700 0.130178 0.821275 0.903226 0.860303 0.972313
171
+ 4500 0.018500 0.139294 0.844016 0.875054 0.859255 0.971140
172
+ 4800 0.020800 0.197811 0.892504 0.873118 0.882705 0.965883
173
+ 5100 0.019300 0.161239 0.848746 0.888172 0.868012 0.967849
174
+ 5400 0.024000 0.139131 0.837507 0.913333 0.873778 0.970591
175
+ 5700 0.018400 0.157223 0.899754 0.864731 0.881895 0.970210
176
+ 6000 0.023500 0.137022 0.883018 0.873333 0.878149 0.973243
177
+ 6300 0.009300 0.181448 0.840490 0.900860 0.869628 0.968290
178
+ 6600 0.019200 0.173125 0.821316 0.896559 0.857290 0.966736
179
+ 6900 0.016100 0.143160 0.789938 0.904946 0.843540 0.968245
180
+ 7200 0.017000 0.145755 0.823274 0.897634 0.858848 0.969037
181
+ 7500 0.012100 0.159342 0.825694 0.883226 0.853491 0.967468
182
+ 7800 0.013800 0.194886 0.861237 0.859570 0.860403 0.964771
183
+ 8100 0.008000 0.140271 0.829914 0.896129 0.861752 0.971567
184
+ 8400 0.010300 0.143318 0.826844 0.908817 0.865895 0.973578
185
+ 8700 0.015000 0.143392 0.847336 0.889247 0.867786 0.973365
186
+ 9000 0.006000 0.143512 0.847795 0.905591 0.875741 0.972892
187
+ 9300 0.011800 0.138747 0.827133 0.894194 0.859357 0.971673
188
+ 9600 0.008500 0.159490 0.837030 0.909032 0.871546 0.970028
189
+ 9900 0.010700 0.159249 0.846692 0.910968 0.877655 0.970546
190
+ 10200 0.008100 0.170069 0.848288 0.900645 0.873683 0.969113
191
+ 10500 0.004800 0.183795 0.860317 0.899355 0.879403 0.969570
192
+ 10800 0.010700 0.157024 0.837838 0.906667 0.870894 0.971094
193
+ 11100 0.003800 0.164286 0.845312 0.880215 0.862410 0.970744
194
+ 11400 0.009700 0.204025 0.884294 0.887527 0.885907 0.968854
195
+ 11700 0.008900 0.162819 0.829415 0.887742 0.857588 0.970530
196
+ 12000 0.006400 0.164296 0.852666 0.901075 0.876202 0.971414
197
+ 12300 0.007100 0.143367 0.852959 0.895699 0.873807 0.973669
198
+ 12600 0.015800 0.153383 0.859224 0.900430 0.879345 0.972679
199
+ 12900 0.006600 0.173447 0.869954 0.899140 0.884306 0.970927
200
+ 13200 0.006800 0.163234 0.856849 0.897204 0.876563 0.971795
201
+ 13500 0.003200 0.167164 0.850867 0.907957 0.878485 0.971231
202
+ 13800 0.003600 0.148950 0.867801 0.910538 0.888656 0.976961
203
+ 14100 0.003500 0.155691 0.847621 0.907957 0.876752 0.974127
204
+ 14400 0.003300 0.157672 0.846553 0.911183 0.877680 0.974584
205
+ 14700 0.002500 0.169965 0.847804 0.917634 0.881338 0.973045
206
+ 15000 0.003400 0.177099 0.842199 0.912473 0.875929 0.971155
207
+ 15300 0.006000 0.164151 0.848928 0.911183 0.878954 0.973258
208
+ 15600 0.002400 0.174305 0.847437 0.906667 0.876052 0.971765
209
+ 15900 0.004100 0.174561 0.852929 0.907957 0.879583 0.972907
210
+ 16200 0.002600 0.172626 0.843263 0.907097 0.874016 0.972100
211
+ 16500 0.002100 0.185302 0.841108 0.907312 0.872957 0.970485
212
+ 16800 0.002900 0.175638 0.840557 0.909247 0.873554 0.971704
213
+ 17100 0.001600 0.178750 0.857056 0.906452 0.881062 0.971765
214
+ 17400 0.003900 0.188910 0.853619 0.907957 0.879950 0.970835
215
+ 17700 0.002700 0.180822 0.864699 0.907097 0.885390 0.972283
216
+ 18000 0.001300 0.179974 0.868150 0.906237 0.886785 0.973060
217
+
218
+ 18300 0.000800 0.188032 0.881022 0.904516 0.892615 0.972572
219
+
220
+ 18600 0.002700 0.183266 0.868601 0.901290 0.884644 0.972298
221
+ 18900 0.001600 0.180301 0.862041 0.903011 0.882050 0.972344
222
+ 19200 0.002300 0.183432 0.855370 0.904301 0.879155 0.971109
223
+ 19500 0.001800 0.183381 0.854501 0.904301 0.878696 0.97118630
224
  ````
225
 
226
  ### Validation metrics by Named Entity
227
  ````
228
  Num examples = 1177
229
 
230
+ {'JURISPRUDENCIA': {'f1': 0.7016574585635359,
231
  'number': 657,
232
+ 'precision': 0.6422250316055625,
233
+ 'recall': 0.7732115677321156},
234
+ 'LEGISLACAO': {'f1': 0.8839681133746677,
235
  'number': 571,
236
+ 'precision': 0.8942652329749103,
237
+ 'recall': 0.8739054290718039},
238
+ 'LOCAL': {'f1': 0.8253968253968254,
239
  'number': 194,
240
+ 'precision': 0.7368421052631579,
241
+ 'recall': 0.9381443298969072},
242
+ 'ORGANIZACAO': {'f1': 0.8934049079754601,
243
  'number': 1340,
244
+ 'precision': 0.918769716088328,
245
+ 'recall': 0.8694029850746269},
246
+ 'PESSOA': {'f1': 0.982653539615565,
247
  'number': 1072,
248
+ 'precision': 0.9877474081055608,
249
+ 'recall': 0.9776119402985075},
250
+ 'TEMPO': {'f1': 0.9657657657657657,
251
  'number': 816,
252
+ 'precision': 0.9469964664310954,
253
+ 'recall': 0.9852941176470589},
254
+ 'overall_accuracy': 0.9725722644643211,
255
+ 'overall_f1': 0.8926146010186757,
256
+ 'overall_precision': 0.8810222036028488,
257
+ 'overall_recall': 0.9045161290322581}
258
  ````