llava-hf/llava-1.5-7b-hf · convert llava-v1.5-7b to liuhaotian/llava-v1.5-7b-hf format

deleted

May 14, 2024

Thank you for your outstanding work. I recently fine-tuned the Llava model based on the liuhaotian/llava-v1.5-7b model. Now, I want to adapt the Llava model using the VLLM framework to improve inference speed. I found that VLLM uses files in the format of llava-v1.5-7b-hf. I want to know how to convert my fine-tuned Llava-v1.5-7b model to the llava-v1.5-7b-hf format. Because if I directly load the Llava-v1.5-7b model using VLLM, I will get an error saying "Model architectures ['LlavaLlamaForCausalLM'] are not supported for now". So I must do the conversion. I want to know how the llava-v1.5-7b-hf format is obtained.

nielsr

Llava Hugging Face org May 14, 2024

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

deleted

May 15, 2024

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.

nielsr

Llava Hugging Face org May 15, 2024

Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb

red-fox-yj

May 18, 2024

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.

Hello, have you succeeded? If so, can you briefly tell me what to do?Thank you for your reply.

deleted

May 20, 2024

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

Thank you for your reply. I'll give it a try later. If successful, I'll update the instructions here.

Hello, have you succeeded? If so, can you briefly tell me what to do?Thank you for your reply.

Following the instructions provided by nielsr's link is correct. The steps outlined there are very detailed.

deleted

May 20, 2024

Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb

The operation link you provided is correct, thank you. Also, do you happen to know how the llava-next project is fine-tuned? Because the official documentation does not provide specific fine-tuning code（https://github.com/LLaVA-VL/LLaVA-NeXT/）.

RaushanTurganbay

Llava Hugging Face org May 20, 2024

LLaVa-NeXT is very similar to LLaVa and can be fine-tuned with the same script by adding a few changes.

I edited the provided notebook to adapt for LLaVa-NeXT: Colab Notebook

deleted

May 20, 2024

LLaVa-NeXT is very similar to LLaVa and can be fine-tuned with the same script by adding a few changes.

I edited the provided notebook to adapt for LLaVa-NeXT: Colab Notebook

Great, thank you for your work. However, in fact, I am more interested in the model fine-tuning process for llava-next-video. Do you have any suggestions? Or could you create a similar Jupyter notebook for fine-tuning?

RaushanTurganbay

Llava Hugging Face org May 20, 2024

We haven't added LLaVa-NeXT-Video to transformers yet

From Video-LLMs there is Video-LLaVa, I am working on adding a fine-tune script for it. Will let you know here when it's ready

RaushanTurganbay

Llava Hugging Face org May 22, 2024

@Dengxiaoyu, I added a tutorial on tuning Video-LLaVa in this Colab notebook

deleted

May 23, 2024

@Dengxiaoyu, I added a tutorial on tuning Video-LLaVa in this Colab notebook

Thank you for your enthusiastic help. If possible, I would also appreciate it if you could create a fine-tuning code for llava-next-video.

RaushanTurganbay

Llava Hugging Face org May 23, 2024

It is not yet added to transformers. We are planning to work on adding and creating notebooks for Llava-Next-Video next month

AmazDeng

May 31, 2024

Btw, I just uploaded a fine-tuning notebook for LLaVa with Transformers here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Fine_tune_LLaVa_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb

May I ask if there are any plans for transformers to support Llava-Next-Video?

RaushanTurganbay

Llava Hugging Face org May 31, 2024

As per the last conversation with the authors, they want to release a better version before adding it in transformers. You can track the issue here

JackBAI

Aug 27, 2024

Hi,

We recommend to leverage the conversion script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

However, I also recommend to verify logits after conversion on the same inputs. I noticed the original LLaVa model pads images whereas the image processor in Transformers doesn't yet.

After conversion I found that the output logits are different. What might be the problem?

nielsr

Llava Hugging Face org Aug 28, 2024

It might be because of image preprocessing settings, make sure to double check whether you are forwarding the same exact pixel values and input id’s through the model.

The original implementation applies padding to the images which is not present in the Transformers library

JackBAI

Aug 28, 2024

Yes, just confirmed that this is true - people who also face this problem should check this out.

It might be because of image preprocessing settings, make sure to double check whether you are forwarding the same exact pixel values and input id’s through the model.

The original implementation applies padding to the images which is not present in the Transformers library

nielsr

Llava Hugging Face org Aug 28, 2024

Could you open an issue on the Transformers library? I had an implementation which 100% matches it, we could update the image processor.

JackBAI

Aug 28, 2024

Done, please see https://github.com/huggingface/transformers/issues/33175.

ha1772007

Aug 30, 2024

can you provide unconverted Llama Part Weights which is used for qwen-interleave-0.5B for single image or multiple image
https://huggingface.co/llava-hf/llava-1.5-7b-hf/discussions/26#66436cdfbf8f506d97a36a41

RaushanTurganbay

Llava Hugging Face org Aug 30, 2024

@ha1772007 you mean the original weights? They can be found here (https://huggingface.co/collections/lmms-lab/llava-next-interleave-66763c55c411b340b35873d1)

MilchstraB

Nov 24, 2024

Done, please see https://github.com/huggingface/transformers/issues/33175.

@JackBAI What should I do to ensure the same image preprocessing setting as LLaVA with Transformers library? I see that they seem to have added a do_pad parameter to control how the image is processed, but I can't find the corresponding code in the main branch of the Transformers library.