BR_BERTo

BR_BERTo / README.md
1
---
2
language: pt
3
tags:
4
- portuguese
5
- brazil
6
- pt_BR
7
widget:
8
- text: gostei muito dessa <mask>
9
---
10
11
# BR_BERTo
12
13
Portuguese (Brazil) model for text inference.
14
15
## Params
16
17
Trained on a corpus of 6_993_330 sentences.
18
19
- Vocab size: 150_000
20
- RobertaForMaskedLM  size : 512
21
- Num train epochs: 3
22
- Time to train: ~10days (on GCP with a Nvidia T4)
23
24
I follow the great tutorial from HuggingFace team:
25
26
[How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)
27
28
More infor here:
29
30
[BR_BERTo](https://github.com/rdenadai/BR-BERTo)
31