jarodrigues commited on
Commit
19ecde1
1 Parent(s): e9a627b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -115,8 +115,8 @@ As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/micr
115
 
116
  To train **Albertina 1.5B PTBR 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
117
  truncation and dynamic padding for 250k steps and a 256-token sequence-truncation for 80k steps.
118
- These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
119
- input sequences and 24 hours of computation for the 512-token input sequences.
120
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
121
 
122
  <br>
 
115
 
116
  To train **Albertina 1.5B PTBR 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
117
  truncation and dynamic padding for 250k steps and a 256-token sequence-truncation for 80k steps.
118
+ These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences and 24 hours of computation for the 256-token
119
+ input sequences.
120
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
121
 
122
  <br>