Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
---
|
2 |
language: it
|
3 |
-
license:
|
4 |
widget:
|
5 |
-
- text:
|
6 |
---
|
7 |
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT-SC/resolve/main/ITALIAN_LEGAL_BERT-SC.jpg" width="600"/>
|
8 |
|
@@ -13,6 +13,4 @@ It is the [ITALIAN-LEGAL-BERT](https://huggingface.co/dlicari/Italian-Legal-BERT
|
|
13 |
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
|
14 |
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
|
15 |
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
|
16 |
-
and vocabulary size of 32000
|
17 |
-
|
18 |
-
|
|
|
1 |
---
|
2 |
language: it
|
3 |
+
license: afl-3.0
|
4 |
widget:
|
5 |
+
- text: Il <mask> ha chiesto revocarsi l'obbligo di pagamento
|
6 |
---
|
7 |
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT-SC/resolve/main/ITALIAN_LEGAL_BERT-SC.jpg" width="600"/>
|
8 |
|
|
|
13 |
It was trained from scratch using a larger training dataset, 6.6GB of civil and criminal cases.
|
14 |
We used [CamemBERT](https://huggingface.co/docs/transformers/main/en/model_doc/camembert) architecture with a language modeling head on top, AdamW Optimizer, initial learning rate 2e-5 (with linear learning rate decay), sequence length 512, batch size 18, 1 million training steps,
|
15 |
device 8*NVIDIA A100 40GB using distributed data parallel (each step performs 8 batches). It uses SentencePiece tokenization trained from scratch on a subset of training set (5 milions sentences)
|
16 |
+
and vocabulary size of 32000
|
|
|
|