How to convert new LLaVA model into HF format?

#9
by zzxslp - opened

Hi, I trained LLaVA-1.5-13b on a customized dataset, how do I convert my saved model + config into HF format? My model is here: https://huggingface.co/zzxslp/som-llava-v1.5-13b, it should be the same setting as https://huggingface.co/liuhaotian/llava-v1.5-13b.

I tried rename key values in safetensors following format in this model checkpoint, but couldn't load the model correctly. For example, the vocab size in this repo is 32064 while the original LLaVA-1.5 used 32000.

Llava Hugging Face org

Hey!

You should be able to convert llava weights for Hf format using this script by running. First clone transformers and then run the script:

git clone https://github.com/huggingface/transformers
python transformers/src/transformers/models/llava/convert_llava_weights_to_hf.py --text_model_id lmsys/vicuna-13b-v1.5 --vision_model_id openai/clip-vit-large-patch14-336 --output_hub_path zzxslp/som-llava-v1.5-13b-hf --old_state_dict_id zzxslp/som-llava-v1.5-13b

Hi there, thanks for the reply, I've converted my model into HF format by slightly modifying the script you provided!

Still, a question is why we need to expand the vocab from 32000 into 32064? Also the original image_token_index in LLaVA is set to -200, and in HF model it is <image>: 32000.

Llava Hugging Face org

Yes, we expand the vocab size by adding an "image" token and "pad" token, but the final vocab size is 30k + 64 for hardware computation efficiency reasons. Having a sequence length multiple of 64 for "float16" precision on A100 GPUs can speed up tensor multiplications. More on that here

And for the "image_token_index", I guess it is done for ease of tokenizing in transformers since "-200" as a negative value cannot be a valid token id.

Sign up or log in to comment