Guidance on How to Train / Finetune Model

by mnbucher - opened Nov 29, 2023

Nov 29, 2023

•

edited Feb 14, 2024

Hi there.

I'm currently trying to set up the code to fine-tune the model on my own vision/language dataset, and I started by looking at the original code repository on Github, but then switched over to Huggingface here as it might be easier to set up the full training pipeline using the HF API.

I can't find any information on how to properly encode the input for training...

build_conversation_input_ids is just for inference, but for training we need to encode both the query and the intended text output + a bunch of other stuff. I'm now digging into the codebase of https://huggingface.co/THUDM/cogvlm-chat-hf/blob/main/modeling_cogvlm.py to better understand the details, but just wanted to check if the authors of CogVLM could maybe provide some guidance here, at least in the README?

Thanks a lot!

Best,
Martin

sidnb13

Dec 3, 2023

Curious if you were able to set up a finetuning pipeline using HF? I'm trying to finetune CogVLM using LoRA on my custom image-text datasets and couldn't find any info on what linear layers to target, etc. from the original repository. The implementation using SAT is a bit obscure so ideally I'd like to use PEFT.

mohammednuruddin

Dec 18, 2023

•

edited Dec 18, 2023

Curious if you were able to set up a finetuning pipeline using HF? I'm trying to finetune CogVLM using LoRA on my custom image-text datasets and couldn't find any info on what linear layers to target, etc. from the original repository. The implementation using SAT is a bit obscure so ideally I'd like to use PEFT.

Were you able to do it?

mnbucher

Dec 18, 2023

hi both @sidnb13 @mohammednuruddin ,
i haven't continued working on this since then, so if you have any running code snippet that might be very valuable.
— cheers, martin

z3ugma

Dec 22, 2023

@sidnb13 I would like to do the same as you are doing, did you succeed at finetuning using PEFT?

ayensujeremiah

Jan 22, 2024

Hello Everyone. I am also trying to finetune the chat version of CogVLM too for image-text dataset. My text set are QA pairs regarding the image in a json format. any idea how I might preprocess the data for the finetuning?

expert78

Jan 26, 2024

can i Finetuning with --quant 4
so it fit on 16gb vram even if it slow a bit?

ayensujeremiah

Jan 28, 2024

@expert78 I am working with 8*A100 80GB GPU environment and I still get Out of Memory issues. Maybe the 4 bit quantized version might help, special when you are not making all layers trainable.

KIK99

Jan 30, 2024

I am getting an error at this point.

Could find a solution
File "/home/ec2-user/trail_ver/llava_train.py", line 121, in
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 562, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3504, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3919, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/modeling_utils.py", line 802, in _load_state_dict_into_meta_model
or (not hf_quantizer.check_quantized_param(model, param, param_name, state_dict))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/anaconda3/lib/python3.11/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 124, in check_quantized_param
if isinstance(module._parameters[tensor_name], bnb.nn.Params4bit):
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'inv_freq'

Can anyone help me out

KIK99

Feb 13, 2024

Hi @mnbucher ,

I am trying to finetune the model with LORA, I just want to know how should I input the data from tokenizer like image and text. pretty confused with that part.

processor_data = model.build_conversation_input_ids(tokenizer, query=text, history=[], images=[image])
is it possible to do it with this. i get some input_ids and padded_masks, attention masks and images. Also I add labels as well to it but I get an error
Error : pyarrow.lib.ArrowInvalid: Column 7 named input_ids expected length 10 but got length 1290

Could you please guide me through this step?

zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org Feb 14, 2024

I'm sorry for the inconvenience. In our demo, we indeed do not have example code for finetuning based on the huggingface model. In our finetune demo on GitHub, we have released code based on the SAT framework, which can be used to finetune the cogvlm_224 and cogvlm_490 SAT models. You might want to check our GitHub: https://github.com/THUDM/CogVLM

Regarding the input id issue, it is foreseeable that it's caused by the lack of padding. You need to pad the inputs for better training.

crux82

Apr 5, 2024

Dear all,

is there any news about examples of the fine-tuning process?

TNX!!!

mohammednuruddin

Apr 19, 2024

Dear all,

is there any news about examples of the fine-tuning process?

TNX!!!

anothercoder2

Jun 29, 2024

Hello Everyone,
Noob here. I want to finetune cogagent on a few websites for specific tasks.
Is there a sample dataset I could look at to see what can I data set I need to create.
Thanks in advance!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment