Aug 12

ztgeng

Aug 17

I cannot reproduce. Can I see your code?

hpsun

Aug 19

•

edited Aug 19

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('Meta-Llama-3.1-405B-Instruct-FP8', use_fast=True)

one bos token id

inputs_id = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)
print(inputs_id)
#[128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 1627, 10263, 220, 2366, 19, 271, 128009, 128006, 882, 128007, 271, 9906, 0, 128009, 128006, 78191, 128007, 271]

two bos token ids

inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs_id = tokenizer(inputs)
print(inputs_id)
#{'input_ids': [128000, 128000, 128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 198, 15724, 2696, 25, 220, 1627, 10263, 220, 2366, 19, 271, 128009, 128006, 882, 128007, 271, 9906, 0, 128009, 128006, 78191, 128007, 271], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

hpsun changed discussion status to closed Aug 19

tanliboy

Aug 19

I encountered a similar issue before and resolved it by removing the bos token from the chat template.
In my experience, this didn't noticeably affect performance, but it did complicate customizing the attention/loss mask.

ztgeng

Aug 20

You can avoid it by adding "add_special_tokens=False" argument in the second call of the tokenizer.

inputs_id = tokenizer(inputs, add_special_tokens=False)

meta-llama
/

Llama-3.1-8B-Instruct

two BOS token id is right?

one bos token id

two bos token ids