Guidance on How to Train / Finetune Model

#6
by mnbucher - opened

Hi there.

I'm currently trying to set up the code to fine-tune the model on my own vision/language dataset, and I started by looking at the original code repository on Github, but then switched over to Huggingface here as it might be easier to set up the full training pipeline using the HF API.

I can't find any information on how to properly encode the input for training...

build_conversation_input_ids is just for inference, but for training we need to encode both the query and the intended text output + a bunch of other stuff. I'm now digging into the codebase of https://huggingface.co/THUDM/cogvlm-chat-hf/blob/main/modeling_cogvlm.py to better understand the details, but just wanted to check if the authors of CogVLM could maybe provide some guidance here, at least in the README?

Thanks a lot!

Best,
Martin

Curious if you were able to set up a finetuning pipeline using HF? I'm trying to finetune CogVLM using LoRA on my custom image-text datasets and couldn't find any info on what linear layers to target, etc. from the original repository. The implementation using SAT is a bit obscure so ideally I'd like to use PEFT.

Curious if you were able to set up a finetuning pipeline using HF? I'm trying to finetune CogVLM using LoRA on my custom image-text datasets and couldn't find any info on what linear layers to target, etc. from the original repository. The implementation using SAT is a bit obscure so ideally I'd like to use PEFT.

Were you able to do it?

hi both @sidnb13 @mohammednuruddin ,
i haven't continued working on this since then, so if you have any running code snippet that might be very valuable.
— cheers, martin

@sidnb13 I would like to do the same as you are doing, did you succeed at finetuning using PEFT?

Hello Everyone. I am also trying to finetune the chat version of CogVLM too for image-text dataset. My text set are QA pairs regarding the image in a json format. any idea how I might preprocess the data for the finetuning?

can i Finetuning with --quant 4
so it fit on 16gb vram even if it slow a bit?

@expert78 I am working with 8*A100 80GB GPU environment and I still get Out of Memory issues. Maybe the 4 bit quantized version might help, special when you are not making all layers trainable.

I am getting an error at this point.

Could find a solution
File "/home/ec2-user/trail_ver/llava_train.py", line 121, in
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 562, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3504, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3919, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 802, in _load_state_dict_into_meta_model
or (not hf_quantizer.check_quantized_param(model, param, param_name, state_dict))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 124, in check_quantized_param
if isinstance(module._parameters[tensor_name], bnb.nn.Params4bit):
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'inv_freq'

Can anyone help me out

Hi @mnbucher ,

I am trying to finetune the model with LORA, I just want to know how should I input the data from tokenizer like image and text. pretty confused with that part.

processor_data = model.build_conversation_input_ids(tokenizer, query=text, history=[], images=[image])
is it possible to do it with this. i get some input_ids and padded_masks, attention masks and images. Also I add labels as well to it but I get an error
Error : pyarrow.lib.ArrowInvalid: Column 7 named input_ids expected length 10 but got length 1290

Could you please guide me through this step?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

I'm sorry for the inconvenience. In our demo, we indeed do not have example code for finetuning based on the huggingface model. In our finetune demo on GitHub, we have released code based on the SAT framework, which can be used to finetune the cogvlm_224 and cogvlm_490 SAT models. You might want to check our GitHub: https://github.com/THUDM/CogVLM

Regarding the input id issue, it is foreseeable that it's caused by the lack of padding. You need to pad the inputs for better training.

Dear all,

is there any news about examples of the fine-tuning process?

TNX!!!

Dear all,

is there any news about examples of the fine-tuning process?

TNX!!!

No

Sign up or log in to comment