how should data be packed?

#16
by shiyue - opened

Hey, for this model, BOS=EOS. So I wonder when packing texts together, it's

EOS text 1 EOS EOS text 2 EOS EOS text 3...

OR

EOS text 1 EOS text 2 EOS text 3...

Thanks!

Microsoft org

Hi
For the model, prefer the second option for packing. For example, one packed training instance might look like

<|endoftext|> This is the first text <|endoftext|> And this is the second text. <|endoftext|>
bapatra changed discussion status to closed

Thanks!

Sign up or log in to comment