InstaDeepAI
/

nucleotide-transformer-2.5b-1000g

Inference Endpoints

Model card Files Files and versions Community

hdallatorre commited on Apr 19, 2023

Commit

b66816c

•

1 Parent(s): 5e14a20

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -79,7 +79,7 @@ The masking procedure used is the standard one for Bert-style training:
 ### Pretraining
-The model was trained with 8 A100 80GB on 300B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
 ### BibTeX entry and citation info

 ### Pretraining
+The model was trained with 128 A100 80GB on 300B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
 ### BibTeX entry and citation info