BR_BERTo / README.md
1 ---
2 language: pt
3 tags:
4 - portuguese
5 - brazil
6 - pt_BR
7 widget:
8 - text: gostei muito dessa <mask>
9 ---
10
11 # BR_BERTo
12
13 Portuguese (Brazil) model for text inference.
14
15 ## Params
16
17 Trained on a corpus of 6_993_330 sentences.
18
19 - Vocab size: 150_000
20 - RobertaForMaskedLM size : 512
21 - Num train epochs: 3
22 - Time to train: ~10days (on GCP with a Nvidia T4)
23
24 I follow the great tutorial from HuggingFace team:
25
26 [How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)
27
28 More infor here:
29
30 [BR_BERTo](https://github.com/rdenadai/BR-BERTo)
31