The missing of 'chat_template' in tokenizer_config.json leads to incorrect generation.

by devymex - opened Jun 6

Jun 6

Fix:

...
  "add_prefix_space": null,
  "chat_template": "{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
...

With the above fix, the following code works fine.

import torch
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
from PIL import Image

model_dir = 'llava-hf/llava-v1.6-vicuna-7b-hf'
device_name = 'cpu' # 'cuda:0'

device = torch.device(device_name)
processor = LlavaNextProcessor.from_pretrained(model_dir)

model = LlavaNextForConditionalGeneration.from_pretrained(
    model_dir, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)
model.to(device)

image1 = Image.open('images/test.jpg')
conversaiton = [{
        "role": "user",
        "content": "<image>\nWhat's in the image?"
    }]
text_prompt = processor.tokenizer.apply_chat_template(
    conversaiton,
    tokenize=False,
    add_generation_prompt=True
)

inputs = processor(text_prompt, [image1], return_tensors='pt').to(device)
output = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(output[0], skip_special_tokens=False))

ChaoGong

Jun 6

Hi, I do this as you comment but it still doesn't work.

Judge Input: <s><|start_header_id|>user<|end_header_id|>

Inpainted image: <image>
Masked image: <image>
Caption: a black and red mountain bike parked on the side of a building<|eot_id|><|start_header_id|>assistant<|end_header_id|>


Judge Output: <|start_header_id|>user<|end_header_id|>

Inpainted image:  
Masked image:  
Caption: a black and red mountain bike parked on the side of a building<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The code is:

            # judge
            judge_prompt = f'Inpainted image: <image>\nMasked image: <image>\nCaption: {caption}'
            judge_chat_history.append({
                'role': 'user',
                'content': judge_prompt
            })
            judge_input = llava_processor.tokenizer.apply_chat_template(judge_chat_history, tokenize=False)
            print(f'Judge Input: {judge_input}')
            images =  [image, mask_image]
            judge_input = llava_processor(text=judge_input, images=images, return_tensors='pt').to(llava_judge.device)
            judge_output_id = llava_judge.generate(**judge_input, max_new_tokens=100)
            judge_output = llava_processor.decode(judge_output_id[0], skip_special_tokens=True)
            print(f'Judge Output: {judge_output}')

RaushanTurganbay

Llava Hugging Face org Jun 7

Thanks for opening an issue, I will soon add templates in LLaVa configuration files for easier formatting. In the meanwhile, your solution is a possible workaround

YujiaX

Jun 14

Happen to find this amazing repo: https://github.com/chujiezheng/chat_templates/tree/main
I have tried the vicuna version. seem to work well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment