jarodrigues
commited on
Commit
•
34426d1
1
Parent(s):
89e0682
Update README.md
Browse files
README.md
CHANGED
@@ -97,12 +97,12 @@ We skipped the default filtering of stopwords since it would disrupt the syntact
|
|
97 |
As codebase, we resorted to the [DeBERTa V1 Base](https://huggingface.co/microsoft/deberta-base), for English.
|
98 |
|
99 |
To train [**Albertina PT-PT Base**](https://huggingface.co/PORTULAN/albertina-ptpt-base), the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
|
100 |
-
The model was trained using the maximum available memory capacity resulting in a batch size of 3072 samples (192 samples per GPU
|
101 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
102 |
A total of 200 training epochs were performed resulting in approximately 180k steps.
|
103 |
The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16 GPUs, 96 vCPUs and 1.360 GB of RAM.
|
104 |
|
105 |
-
To train **Albertina PT-BR Base** we followed the same hyperparameterization as the Albertina
|
106 |
The model was trained with a total of 150 training epochs resulting in approximately 180k steps.
|
107 |
|
108 |
|
|
|
97 |
As codebase, we resorted to the [DeBERTa V1 Base](https://huggingface.co/microsoft/deberta-base), for English.
|
98 |
|
99 |
To train [**Albertina PT-PT Base**](https://huggingface.co/PORTULAN/albertina-ptpt-base), the data set was tokenized with the original DeBERTa tokenizer with a 128 token sequence truncation and dynamic padding.
|
100 |
+
The model was trained using the maximum available memory capacity resulting in a batch size of 3072 samples (192 samples per GPU).
|
101 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
102 |
A total of 200 training epochs were performed resulting in approximately 180k steps.
|
103 |
The model was trained for one day on a2-megagpu-16gb Google Cloud A2 VMs with 16 GPUs, 96 vCPUs and 1.360 GB of RAM.
|
104 |
|
105 |
+
To train **Albertina PT-BR Base** we followed the same hyperparameterization as the Albertina PT-PT Base model.
|
106 |
The model was trained with a total of 150 training epochs resulting in approximately 180k steps.
|
107 |
|
108 |
|