<s> token

#65
by Muennighoff - opened
BigScience Workshop org
edited Jul 28, 2022

The <s> token (bos token) is never used during pre-training right? ( @stas maybe?)
Afaik we only use </s> (eos token) sparingly after documents

Want to try using <s> as a sep token for fine-tuning cc @TimeRobber

BigScience Workshop org
edited Aug 1, 2022

Never is a strong word because if the pretraining dataset holds some <s> occurences it's going to be considered as <bos> but I'd say there shouldn't be many tokens in pretraining dataset that match that.

Sign up or log in to comment