batch image caption

#7
by joelr23 - opened

hi,is there any method to generate batch image caption?

joelr23 changed discussion title from batch prompt to batch image caption

Hi, you can put a list of images and a list of text (prompts), it should work. If there is something wrong, let me know please.

here is my code:


# func run
def run_example(images):
    prompt = "<grounding> Describe this image in detail:"
    batch_ppt = [prompt] * len(images)
    # inputs = processor(text=prompt, images=image, return_tensors="pt")
    inputs = processor(text=batch_ppt, images=images, return_tensors="pt")
    generated_ids = model.generate(
        pixel_values=inputs["pixel_values"],
        input_ids=inputs["input_ids"][:, :-1],
        attention_mask=inputs["attention_mask"][:, :-1],
        img_features=None,
        img_attn_mask=inputs["img_attn_mask"][:, :-1],
        use_cache=True,
        max_new_tokens=128,
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    _processed_text = processor.post_process_generation(generated_text, cleanup_and_extract=False)
    processed_text, entities = processor.post_process_generation(generated_text)
    print(processed_text)
    # print(entities)
    # print(_processed_text)

img_path = '/prompts_data/snowman.jpg'
images = [Image.open(img_path)] * 3
run_example(images)

then i get the following error info:

Traceback (most recent call last):
  File "/cfs-nj-gameai/joelrliu/prompts_data/ko.py", line 39, in <module>
    generated_ids = model.generate(
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1739, in generate
    output = self.text_model.generate(
  File "/usr/miniconda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/miniconda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1596, in generate
    return self.greedy_search(
  File "/usr/miniconda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2444, in greedy_search
    outputs = self(
  File "/usr/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1362, in forward
    outputs = self.model(
  File "/usr/miniconda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1068, in forward
    hidden_states = self.forward_embedding(
  File "/root/.cache/huggingface/modules/transformers_modules/kosmos-2-patch14-224/modeling_kosmos2.py", line 1010, in forward_embedding
    inputs_embeds[img_input_mask.to(dtype=torch.bool)] = img_features
RuntimeError: shape mismatch: value tensor of shape [3, 64, 2048] cannot be broadcast to indexing result of shape [192, 2048]

it seems that the shape is mismatch, so I try use the reshape to fix the code as follow:

    inputs_embeds[img_input_mask.to(dtype=torch.bool)] = img_features.reshape[-1, img_features.shape[-1]]

the error has gone, however, get unexcpted prompt result...

This comment has been hidden

Thanks for opening this issue @joelr23 . There is indeed some problems when using batch. I will take a deeper look.

Hello again! I made a small change, and it should be able to run with batch examples now.

[Note!] The current code snippet (the [:, :-1] part below) won't work with batch examples if there is padding happening! But in your case, there is no padding, so it's fine.

inputs["input_ids"][:, :-1]

There is an on going effort to port Kosmos-2 directly into transformers. This repository (remote code) might need some more bug fixes later, including some breaking changes.

Hello again! I made a small change, and it should be able to run with batch examples now.

[Note!] The current code snippet (the [:, :-1] part below) won't work with batch examples if there is padding happening! But in your case, there is no padding, so it's fine.

inputs["input_ids"][:, :-1]

There is an on going effort to port Kosmos-2 directly into transformers. This repository (remote code) might need some more bug fixes later, including some breaking changes.

thanks for your respone! use view is work~!

joelr23 changed discussion status to closed
joelr23 changed discussion status to open
joelr23 changed discussion status to closed

Sign up or log in to comment