Why is the result differ from liuhaotian/llava-v1.5-7b

#8
by wk21 - opened

Hi Thank you so much about providing this.

Actually, I tested some inference from your llava version.
But it answer something which differ from the result of liuhaotian/llava-v1.5-7b.
It same as input.

What is the difference? I'm confusing it.
There is some changes from your version on model?

Thanks

Also, I have the following errors. Do you know the reason..?
it always happened

"Processing on val: 0%| | 0/500 [00:00<?, ?it/s]/databricks/python/lib/python3.10/site-packages/torch/nn/modules/conv.py:459: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
return F.conv2d(input, weight, bias, self.stride,
Processing on val: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 198/500 [09:38<14:41, 2.92s/it]
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
"

Llava Hugging Face org

Hi @wk21
Thanks for the issue, can you share exactly how do you get: https://huggingface.co/llava-hf/llava-1.5-7b-hf/discussions/8#657eec1c792970912818a52d - https://github.com/huggingface/transformers/pull/28032 might fix it but I couldn't repro at all.

Llava Hugging Face org

Actually, I tested some inference from your llava version.
But it answer something which differ from the result of liuhaotian/llava-v1.5-7b.

Thanks for testing ! DO you use the same prompt template for both models?

I just do image-caption. it makes some different text.

Actually, I used the prompt like "USER: \nPlease describe the image in detail \nASSISTANT:" and output = model.generate(**inputs, max_new_tokens=300, do_sample=False)
when it comes to "", I used the prompt like "Please describe the image in detail \nASSISTANT:" with liuhaotian/llava-v1.5-7b. it uses output = model.generate(
input_ids,
images=images_tensor,
do_sample=False,
max_new_tokens=300,
)

but it's different.

Also, when does it apply to transformers? https://github.com/huggingface/transformers/pull/28032

Did you solve the "RuntimeError: CUDA error: device-side assert triggered"? I got the same one

Sign up or log in to comment