jarodrigues commited on
Commit
cb47993
1 Parent(s): cba1aa1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -107,7 +107,7 @@ As codebase, we resorted to the [DeBERTa V2 XLarge](https://huggingface.co/micro
107
 
108
  To train **Albertina-PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
109
  The model was trained using the maximum available memory capacity resulting in a batch size of 832 samples (52 samples per GPU and applying gradient accumulation in order to approximate the batch size of the PT-BR model).
110
- Similarly to the PT-BR variant above, we opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
111
  However, since the number of training examples is approximately twice of that in the PT-BR variant, we reduced the number of training epochs to half and completed only 25 epochs, which resulted in approximately 245k steps.
112
  The model was trained for 3 days on a2-highgpu-8gb Google Cloud A2 VMs with 8 GPUs, 96 vCPUs and 680 GB of RAM.
113
 
 
107
 
108
  To train **Albertina-PT-PT**, the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
109
  The model was trained using the maximum available memory capacity resulting in a batch size of 832 samples (52 samples per GPU and applying gradient accumulation in order to approximate the batch size of the PT-BR model).
110
+ Similarly to the PT-BR variant, we opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
111
  However, since the number of training examples is approximately twice of that in the PT-BR variant, we reduced the number of training epochs to half and completed only 25 epochs, which resulted in approximately 245k steps.
112
  The model was trained for 3 days on a2-highgpu-8gb Google Cloud A2 VMs with 8 GPUs, 96 vCPUs and 680 GB of RAM.
113