Tokenizer EOS Token

#21
by saksham-lamini - opened

For instruct, we have an eot_id, and eos_id. Via the tokenizer interface, only the tokenizer.eos_token_id exposes eos_id. There doesn't seem to be a way to expose the eot_id token, which would be important for stopping criterias, etc.

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(xxx, eos_token_id=terminators)

Meta Llama org

See also #4

pcuenq changed discussion status to closed

Sign up or log in to comment