microsoft/llava-med-v1.5-mistral-7b · model type llava

May 30

ValueError: The checkpoint you are trying to load has model type llava_mistral but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

How to add this model type in Transformers?

shengz

Microsoft org May 30

Hi - please check our repo (https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#contents) for the use of LLaVA-Med v1.5.

shashgf

Jun 3

Hi, thanks for you response. I want to finetune your base-model on my own data. Let me know if it is possible to do so in near future? Thank you

zwq803

Jun 10

Hi, did you solve the problem? I met the same problem that Transformers does not recognize this architecture. I download the model files and run them offline.

shshwtv

Jun 13

No. It's not possible. We switched to original llava model.

thedaffodil

Jun 23

Hi, thanks for you response. I want to finetune your base-model on my own data. Let me know if it is possible to do so in near future? Thank you

Hello. I want to fine tune the Llava-med on my own dataset. Is it possible, did you find a solution?

mizukiQ

Jun 28

I can load it successfully, steps are as following:

clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment
download parameters from this repository
use the following code to load model:

from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )

Then I can use this model like any other models in Hugging Face transformers library.

shshwtv

Jun 28

Thanks for the update @mizukiQ

Also, wanted to ask - what's the max resolution of image that can be used? ViTL14 supports 224 x 224.
And what are various strategies to handle CT/ MR images.

I am also willing to join on discord or zoom to catchup and exchange-notes with other builders in the space.

Best,
Shash

thedaffodil

Jul 1

I can load it successfully, steps are as following:

clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment

download parameters from this repository

use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )
Then I can use this model like any other models in Hugging Face transformers library.

thank you!
I also want to ask how should I prepare my dataset. I have images and captions. How should I convert them for fine tuning on lava med?
Is there any tutorial ?

mizukiQ

Jul 5

Thanks for the update @mizukiQ

Also, wanted to ask - what's the max resolution of image that can be used? ViTL14 supports 224 x 224.
And what are various strategies to handle CT/ MR images.

I am also willing to join on discord or zoom to catchup and exchange-notes with other builders in the space.

Best,
Shash

I am not the official researcher of llava-med, here's some configuration from their code:
llava-med utilizes CLIPImageProcessor to handle image, and its crop_size is (336, 336), and it will split it into patches with shape (24, 24), where each patch is 14 x 14.
llava-med handles the images in different modalities (CT or MR) in the same way.

mizukiQ

Jul 5

thank you!
I also want to ask how should I prepare my dataset. I have images and captions. How should I convert them for fine tuning on lava med?
Is there any tutorial ?

I publish some data loading code here (the code is from my reproduction of llava-med, it may be not good enough)
You can replace SlakeDataset class with your own dataset class (since image caption and vqa have the similar form I + Q -> T), just keep two interfaces the same. and finetune the model.

thedaffodil

Jul 13

I actually want to convert my image-text pair dataset to QA dataset like in the paper.
So I need to get question and answers from my texts first.
maybe after that I can use the exact code without changing in the repo
do you have any recommendation for that "meta-llama/Llama-3-8b-chat-hf" and "meta-llama/Meta-Llama-3-8B-Instruct" models seems suitable for this task

marinasam

Jul 26

I can load it successfully, steps are as following:

clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment

download parameters from this repository

use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )
Then I can use this model like any other models in Hugging Face transformers library.

hey, awesome solution
i have a bug that my path does not appear to have a file named config.json
any help?
thanks

satheeshkola532

Aug 1

i have a bug that my path does not appear to have a file named config.json
any help? did you solve this error?

satheeshkola532

Aug 2

can anyone send the updated code.. now i am not able to load the model

mizukiQ

Aug 8

@marinasam @satheeshkola532

I am confused about your problem. This repository (https://huggingface.co/microsoft/llava-med-v1.5-mistral-7b) does contain config.json.
How about re-cloning this repository and checking file integrity?

SrijitMukherjee

Aug 9

This comment has been hidden

abhihash01

Aug 15

•

edited Aug 15

Easiest way to load

git clone https://github.com/microsoft/LLaVA-Med

Then in this directory run this

from llava.model.builder import load_pretrained_model
model_path='microsoft/llava-med-v1.5-mistral-7b'
model_base=None
model_name='llava-med-v1.5-mistral-7b'
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device="cuda")

For quantized loading though, this doesn't work with some strange errors like llava_mistral is unrecognized
This can be bypassed by

from transformers import AutoTokenizer, BitsAndBytesConfig
from llava.model import LlavaMistralForCausalLM
from transformers import AutoModelForCausalLM

model_path = "microsoft/llava-med-v1.5-mistral-7b"
kwargs = {"device_map": "auto"}
kwargs['load_in_4bit'] = True
kwargs['quantization_config'] = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4'
)
#model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
model = LlavaMistralForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)

*AutoModelForCausalLM did work for me sometimes. But I can't seem to reproduce it now

arvindmvepa

Oct 9

•

edited Oct 9

Whenever I try to follow the previously mentioned instructions, I always get "Some weights of the model checkpoint at llava-med-v1.5-mistral-7b were not used when initializing LlavaMistralForCausalLM: ['model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight'...". My understanding is that the pretrained vision_tower from llava-med-v1.5-mistral-7b is not being used and, instead, another pretrained vision tower from the LLava-med repository is being used. Can anyone clarify?

claudioreeves

Oct 11

Before loading the model, could someone please explain to me how to download parameters from the repository ? I dont understand this step

Sweson

Oct 12

It is more easier to follow this .py file: LLaVA-Med/llava/eval/model_vqa.py

nielsr

Oct 20

It would be cool is someone converted the weights to the Transformers-native model, LlavaForConditionalGeneration.

Here's the conversion script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py.

meijiaka

Oct 29

I can load it successfully, steps are as following:

clone repository from https://github.com/microsoft/LLaVA-Med and create virtual environment

download parameters from this repository

use the following code to load model:
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
        model_path='<path_to_downloaded_repository(this)>',
        model_base=None,
        model_name='llava-med-v1.5-mistral-7b'
 )
Then I can use this model like any other models in Hugging Face transformers library.

hi，could you give me a sample code that can directly use pictures and questions to generate answers? My code keeps reporting errors and cannot be generated.

microsoft
/

llava-med-v1.5-mistral-7b

model type llava_mistral is unrecognised