--- language: pt tags: - portuguese - brazil - pt_BR widget: - text: gostei muito dessa --- # BR_BERTo Portuguese (Brazil) model for text inference. ## Params Trained on a corpus of 6_993_330 sentences. - Vocab size: 150_000 - RobertaForMaskedLM size : 512 - Num train epochs: 3 - Time to train: ~10days (on GCP with a Nvidia T4) I follow the great tutorial from HuggingFace team: [How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train) More infor here: [BR_BERTo](https://github.com/rdenadai/BR-BERTo)