Question on the training epoch
#1
by
Tomohide
- opened
Thank you for releasing this great model.
I have one question.
The "Training" section says that "The model was trained on around 312.5B tokens from Japanese CC-100, Japanese C4, and Japanese Wikipedia.." .
I think the total number of tokens in these corpora is about 180B, and so this statement means the training epoch is 1.73 epochs (= 312.5 / 180)?
Thank you in advance.
@Tomohide
You are welcome.
Since data processing, filtering, and resampling have been applied to the training data, the exact token number might not match your assumption.
But I believe the final dataset token number is not too different from 180B, so the estimation of 1.73 epochs should be close enough.
Tomohide
changed discussion status to
closed