The input_id of 0 corresponds to the PAD Token but from input_id of 1 on to 99 there are unused tokens like in the Title. Why do exist? Seems like a waste of vocab size and also embedding matrix size and correspondingly memory
· Sign up or log in to comment