PORTULAN
/

albertina-100m-portuguese-ptbr-encoder

albertina-ptpt-base

albertina-ptbr-base

foundation model

Inference Endpoints

Model card Files Files and versions Community

jarodrigues commited on Jun 20, 2023

Commit

34426d1

•

1 Parent(s): 89e0682

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -97,12 +97,12 @@ We skipped the default filtering of stopwords since it would disrupt the syntact
 As codebase, we resorted to the [DeBERTa V1 Base](https://huggingface.co/microsoft/deberta-base), for English.
 To train [**Albertina PT-PT Base**](https://huggingface.co/PORTULAN/albertina-ptpt-base), the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
-The model was trained using the maximum available memory capacity resulting in a batch size of 3072 samples (192 samples per GPU and applying gradient accumulation in order to approximate the batch size of the PT-BR model).
 We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 A total of 200 training epochs were performed resulting in approximately 180k steps.
 The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16 GPUs, 96 vCPUs and 1.360 GB of RAM.
-To train **Albertina PT-BR Base** we followed the same hyperparameterization as the Albertina-PT-PT Base model.
 The model was trained with a total of 150 training epochs resulting in approximately 180k steps.

 As codebase, we resorted to the [DeBERTa V1 Base](https://huggingface.co/microsoft/deberta-base), for English.
 To train [**Albertina PT-PT Base**](https://huggingface.co/PORTULAN/albertina-ptpt-base), the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
+The model was trained using the maximum available memory capacity resulting in a batch size of 3072 samples (192 samples per GPU).
 We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 A total of 200 training epochs were performed resulting in approximately 180k steps.
 The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16 GPUs, 96 vCPUs and 1.360 GB of RAM.
+To train **Albertina PT-BR Base** we followed the same hyperparameterization as the Albertina PT-PT Base model.
 The model was trained with a total of 150 training epochs resulting in approximately 180k steps.