How many data you use
#1
by
kevinpro
- opened
365 * 16 = 5840
I notice that you train the model with 1 epoch(365 steps) and batchsize 16
Does that mean you only train on 5840 samples?
We use sample packing, so its 5840 samples packed to 8096 tokens, so more samples packed together
We use sample packing, so its 5840 samples packed to 8096 tokens, so more samples packed together
Thank you for your explaination!
My understanding is that you used all the data: 76,338 + 669 + 6,206 samples. Also, due to packing, these samples were compiled into training data consisting of 8,096 * 5,840 tokens. Please let me know if my understanding is correct.