Different tokenization result to llama-model reference implementation

#4
by heheda - opened

The tokenize result of meta’s reference implementation and huggingface is different.
For the one-image request in meta (scripts/multimodal_example_chat_completion.py), its tokenization result is:
128000, 128006, 882, 128007, 271, 128256, 75885, 420, 2217, 304, 1403, 23719, 128009, 128006, 78191, 128007, 271,
While huggingface provides:
256, 128000, 256, 128006, 882, 128007, 271, 257, 128256, 262, 61885, 420, 2217, 304, 1403, 23719, 257, 128009, 262, 128006, 78191, 128007, 271, 220

huggingface test code:

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor, AutoTokenizer

model_id = "Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Describe this image in two sentences"}
        ]
    }
]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
# print("text is:", text)

url = "https://llava-vl.github.io/static/images/view.jpg"
raw_image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=text, images=raw_image, return_tensors="pt").to(model.device)
print("input_ids:", inputs['input_ids'])
output = model.generate(**inputs, do_sample=False, max_new_tokens=25)
print(processor.decode(output[0]))
Meta Llama org

Using the chat_template from 90B Instruct solves this issue, now I can get [128000, 128006, 882, 128007, 271, 128256, 75885, 420, 2217, 304, 1403, 23719, 128009, 128006, 78191, 128007, 271]

Meta Llama org

now fixed by this PR, I can get [128000, 128006, 882, 128007, 271, 128256, 75885, 420, 2217, 304, 1403, 23719, 128009, 128006, 78191, 128007, 271]

Thank you very much!

Meta Llama org

Thank you @wukaixingxp 🙌 We'll update the template in the tokenizer_config.json file if needed, as that's the one used by the transformers tokenizer.

Meta Llama org

Fixed, closing now.

pcuenq changed discussion status to closed

Sign up or log in to comment