The missing of 'chat_template' in tokenizer_config.json leads to incorrect generation.

#4
by devymex - opened

Fix:

...
  "add_prefix_space": null,
  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
...

With the above fix, the following code works fine.

import torch
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
from PIL import Image

model_dir = 'llava-hf/llava-v1.6-vicuna-7b-hf'
device_name = 'cpu' # 'cuda:0'

device = torch.device(device_name)
processor = LlavaNextProcessor.from_pretrained(model_dir)

model = LlavaNextForConditionalGeneration.from_pretrained(
    model_dir, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)
model.to(device)

image1 = Image.open('images/test.jpg')
conversaiton = [{
        "role": "user",
        "content": "<image>\nWhat's in the image?"
    }]
text_prompt = processor.tokenizer.apply_chat_template(
    conversaiton,
    tokenize=False,
    add_generation_prompt=True
)

inputs = processor(text_prompt, [image1], return_tensors='pt').to(device)
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=False))

Hi, I do this as you comment but it still doesn't work.

Judge Input: <s><|start_header_id|>user<|end_header_id|>

Inpainted image: <image>
Masked image: <image>
Caption: a black and red mountain bike parked on the side of a building<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Judge Output: <|start_header_id|>user<|end_header_id|>

Inpainted image:  
Masked image:  
Caption: a black and red mountain bike parked on the side of a building<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The code is:

            # judge
            judge_prompt = f'Inpainted image: <image>\nMasked image: <image>\nCaption: {caption}'
            judge_chat_history.append({
                'role': 'user',
                'content': judge_prompt
            })
            judge_input = llava_processor.tokenizer.apply_chat_template(judge_chat_history, tokenize=False)
            print(f'Judge Input: {judge_input}')
            images =  [image, mask_image]
            judge_input = llava_processor(text=judge_input, images=images, return_tensors='pt').to(llava_judge.device)
            judge_output_id = llava_judge.generate(**judge_input, max_new_tokens=100)
            judge_output = llava_processor.decode(judge_output_id[0], skip_special_tokens=True)
            print(f'Judge Output: {judge_output}')
Llava Hugging Face org

Thanks for opening an issue, I will soon add templates in LLaVa configuration files for easier formatting. In the meanwhile, your solution is a possible workaround

Happen to find this amazing repo: https://github.com/chujiezheng/chat_templates/tree/main
I have tried the vicuna version. seem to work well.

Sign up or log in to comment