How to finetune this model

#1
by huoyuan - opened

does anyone train this model on local datasets? which kind of GPU we need to finetune this model?

This comment has been hidden

I have the same question

I have the same question

I have the same question

I have the same question, plase share the training dataset

I'm trying to use lora from peft to fine-tune this model, and it seems to be working.

I have the same question

I'm trying to use lora from peft to fine-tune this model, and it seems to be working.

Could you share the code?

i try to use lora too, but the result seems weird.
could you please share your settings? especially how to process the input text and target text. i think something is wrong with my code.

I'm trying to use lora from peft to fine-tune this model, and it seems to be working.

Could you share the code?

i try to use lora too, but the result seems weird.
could you please share your settings? especially how to process the input text and target text. i think something is wrong with my code.

Sure! I put the code here: https://github.com/thaumstrial/FinetuneGLMWithPeft
You can share your ideas and suggestions.

However, I changed the original code to train ClueAI/ChatYuan-large-v1. The current code is rewritten according to my impression, and there may be some missing places

thanks for sharing!
i have a question here: why are you use TaskType.CAUSAL_LM in lora config, instead of TaskType.SEQ_2_SEQ_LM? i think chatglm is a seq2seq model(because of the transformers class: AutoModelForSeq2SeqLM)

thanks for sharing!
i have a question here: why are you use TaskType.CAUSAL_LM in lora config, instead of TaskType.SEQ_2_SEQ_LM? i think chatglm is a seq2seq model(because of the transformers class: AutoModelForSeq2SeqLM)

That's what got me confused. I also thought chatglm was a seq2seq model, but if you use TaskType.SEQ_2_SEQ_LM, chatglm's forward function is missing some parameters.

I don't know if chatglm has changed their code yet, but I did start with TaskType.SEQ_2_SEQ_LM and changed chatglm's code(https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py). Does TaskType.SEQ_2_SEQ_LM work when you try it now?

You can see that the forward function in modeling_chatglm.py is missing some parameters about encoder.

I don't know if chatglm has changed their code yet, but I did start with TaskType.SEQ_2_SEQ_LM and changed chatglm's code(https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py). Does TaskType.SEQ_2_SEQ_LM work when you try it now?

You can see that the forward function in modeling_chatglm.py is missing some parameters about encoder.

I think I had the same situation with TaskType.SEQ_2_SEQ_LM before. I just edited the source code of chatglm where the function missing parameters. I did some debugging and added a **kwargs and it worked.

I tried your code a few hours ago and it still didn't work. the fine-tuning makes the model worse. I guess the data is not handled in the right way.

I don't know if chatglm has changed their code yet, but I did start with TaskType.SEQ_2_SEQ_LM and changed chatglm's code(https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py). Does TaskType.SEQ_2_SEQ_LM work when you try it now?

You can see that the forward function in modeling_chatglm.py is missing some parameters about encoder.

I think I had the same situation with TaskType.SEQ_2_SEQ_LM before. I just edited the source code of chatglm where the function missing parameters. I did some debugging and added a **kwargs and it worked.

I tried your code a few hours ago and it still didn't work. the fine-tuning makes the model worse. I guess the data is not handled in the right way.
Can you share some of your training hyperparameters? In my test, I only try a few data to convince chatglm that itself wasn't a robot, but I set lr and batch_num very high, 1e-2 to 1e-3, batch_num around 10 and no warmup.

I don't know if chatglm has changed their code yet, but I did start with TaskType.SEQ_2_SEQ_LM and changed chatglm's code(https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py). Does TaskType.SEQ_2_SEQ_LM work when you try it now?

You can see that the forward function in modeling_chatglm.py is missing some parameters about encoder.

I think I had the same situation with TaskType.SEQ_2_SEQ_LM before. I just edited the source code of chatglm where the function missing parameters. I did some debugging and added a **kwargs and it worked.

I tried your code a few hours ago and it still didn't work. the fine-tuning makes the model worse. I guess the data is not handled in the right way.
Can you share some of your training hyperparameters? In my test, I only try a few data to convince chatglm that itself wasn't a robot, but I set lr and batch_num very high, 1e-2 to 1e-3, batch_num around 10 and no warmup.

num batches: 16(sum of all gpus)
warmup: None
lr: 3e-3
lora config:

  • target module: ["query_key_value"]
  • r: 8
  • lora_alpha: 32
  • lora_dropout: 0.1

I finetuned the model on few data too(about 700). The same setting on bloomz-7b1-mt or gpt-neox-20b works well.

I don't know if chatglm has changed their code yet, but I did start with TaskType.SEQ_2_SEQ_LM and changed chatglm's code(https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py). Does TaskType.SEQ_2_SEQ_LM work when you try it now?

You can see that the forward function in modeling_chatglm.py is missing some parameters about encoder.

I think I had the same situation with TaskType.SEQ_2_SEQ_LM before. I just edited the source code of chatglm where the function missing parameters. I did some debugging and added a **kwargs and it worked.

I tried your code a few hours ago and it still didn't work. the fine-tuning makes the model worse. I guess the data is not handled in the right way.
Can you share some of your training hyperparameters? In my test, I only try a few data to convince chatglm that itself wasn't a robot, but I set lr and batch_num very high, 1e-2 to 1e-3, batch_num around 10 and no warmup.

num batches: 16(sum of all gpus)
warmup: None
lr: 3e-3
lora config:

  • target module: ["query_key_value"]
  • r: 8
  • lora_alpha: 32
  • lora_dropout: 0.1

I finetuned the model on few data too(about 700). The same setting on bloomz-7b1-mt or gpt-neox-20b works well.
Could you please share your code? What special treatment did you do to the sentence?

I don't know if chatglm has changed their code yet, but I did start with TaskType.SEQ_2_SEQ_LM and changed chatglm's code(https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py). Does TaskType.SEQ_2_SEQ_LM work when you try it now?

You can see that the forward function in modeling_chatglm.py is missing some parameters about encoder.

I think I had the same situation with TaskType.SEQ_2_SEQ_LM before. I just edited the source code of chatglm where the function missing parameters. I did some debugging and added a **kwargs and it worked.

I tried your code a few hours ago and it still didn't work. the fine-tuning makes the model worse. I guess the data is not handled in the right way.
Can you share some of your training hyperparameters? In my test, I only try a few data to convince chatglm that itself wasn't a robot, but I set lr and batch_num very high, 1e-2 to 1e-3, batch_num around 10 and no warmup.

num batches: 16(sum of all gpus)
warmup: None
lr: 3e-3
lora config:

  • target module: ["query_key_value"]
  • r: 8
  • lora_alpha: 32
  • lora_dropout: 0.1

I finetuned the model on few data too(about 700). The same setting on bloomz-7b1-mt or gpt-neox-20b works well.
Could you please share your code? What special treatment did you do to the sentence?

Sorry for the late reply. My code is integrate with some other codes in my company, it is troublesome to abstract it. I guess you are Chinese? you can follow this issue https://github.com/mymusise/ChatGLM-Tuning/issues/11#issuecomment-1474880311 to see what did I do with the dataset. I think your code has similar problem like that.

Tried running the training code and it went out of memory for Google Colab Pro+, exceeding 40GB of GPU RAM. Anyone faced similar issue?

Check out my blog here if you understand Chinese 《对 ChatGLM-6B 做 LoRA Fine-tuning》

Check out my blog here if you understand Chinese 《对 ChatGLM-6B 做 LoRA Fine-tuning》
Thank you! I will definitely try it out. I previously used another sample code for LoRA Fine-tuning of GLM and the memory usage went above 40GB.

I have a question though @aizpy . Based on your blog, am I right to say that after fine-tuning, you saved all the LoRA of k, q and v?

I have a question though @aizpy . Based on your blog, am I right to say that after fine-tuning, you saved all the LoRA of k, q and v?

@AegisGPT Yes, all the parameters that need to do gradient descent calculations are saved. And these are the very parameters that LoRA has processed.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

Why don't try p-tuning? Only 7G gpu ram is needed: https://github.com/THUDM/ChatGLM-6B/blob/main/ptuning/README.md

Sign up or log in to comment