pretrain dataset 9T
#4
by
bpwl0121
- opened
hi,
thanks for your work!
from the model card, you say,
"It is pre-trained for a total of 9 trillion tokens, consisting of a diverse assortment of English-based texts, 50+ natural languages and 40+ coding languages."
a 340B model, you ONLY train the model for 9T tokens? btw, llama3 70B for 15T tokens.
best