jarodrigues commited on
Commit
34426d1
1 Parent(s): 89e0682

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -97,12 +97,12 @@ We skipped the default filtering of stopwords since it would disrupt the syntact
97
  As codebase, we resorted to the [DeBERTa V1 Base](https://huggingface.co/microsoft/deberta-base), for English.
98
 
99
  To train [**Albertina PT-PT Base**](https://huggingface.co/PORTULAN/albertina-ptpt-base), the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
100
- The model was trained using the maximum available memory capacity resulting in a batch size of 3072 samples (192 samples per GPU and applying gradient accumulation in order to approximate the batch size of the PT-BR model).
101
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
102
  A total of 200 training epochs were performed resulting in approximately 180k steps.
103
  The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16 GPUs, 96 vCPUs and 1.360 GB of RAM.
104
 
105
- To train **Albertina PT-BR Base** we followed the same hyperparameterization as the Albertina-PT-PT Base model.
106
  The model was trained with a total of 150 training epochs resulting in approximately 180k steps.
107
 
108
 
 
97
  As codebase, we resorted to the [DeBERTa V1 Base](https://huggingface.co/microsoft/deberta-base), for English.
98
 
99
  To train [**Albertina PT-PT Base**](https://huggingface.co/PORTULAN/albertina-ptpt-base), the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
100
+ The model was trained using the maximum available memory capacity resulting in a batch size of 3072 samples (192 samples per GPU).
101
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
102
  A total of 200 training epochs were performed resulting in approximately 180k steps.
103
  The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16 GPUs, 96 vCPUs and 1.360 GB of RAM.
104
 
105
+ To train **Albertina PT-BR Base** we followed the same hyperparameterization as the Albertina PT-PT Base model.
106
  The model was trained with a total of 150 training epochs resulting in approximately 180k steps.
107
 
108