LoraConfig's target_modul with peft ?

#10
by Handgun1773 - opened

Hello, thanks for the great work, I'm excited to see how it performs compared to llama, bloom, redpajama, … when fine-tuned with my datasets.
However, I can't run a training, because if I don't specify a target_modules in my lora config, I get an error, and if I specify 'q_proj' and 'v_proj' like with llama, I get an error.

Could you tell me what should be put under target_modules for a LoraConfig object from the peft library?

Handgun1773 changed discussion title from target_modul LoraConfig with peft ? to LoraConfig's target_modul with peft ?

After looking at model.named_modules() from llama and btlm, I decided to go with ["c_proj", "c_attn"], and so far the fine-tuning is working. Both q_proj-v_proj in llama and c_proj-c_attn seem to be linked to attention, but I'm really not sure of what I am doing...

Edit: fine-tuning done, it seems to work quite well on my first small dataset.

Cerebras org

Hi @Handgun1773 , thanks for your interests in BTLM.
c_attn and c_proj modules in BTLM matches [q_proj, k_proj, v_proj] and o_proj in LLaMA, respectively.

Hi @Handgun1773 , since you've managed to fine-tune it I dare to ask what token did you use for padding?
I've assumed usual tokenizer.pad_token = tokenizer.eos_token.

I've assumed usual tokenizer.pad_token = tokenizer.eos_token.

Yes. Seems that all the special tokens are the same anyway:

{
  "bos_token": "<|endoftext|>",
  "eos_token": "<|endoftext|>",
  "unk_token": "<|endoftext|>"
}

@Handgun1773 Did you need any patch to the model to get peft working?

Using trl (with your suggested target_modules) python trl/examples/scripts/sft_trainer.py --use_peft --load_in_8bit --model_name "cerebras/btlm-3b-8k-base" --dataset_name timdettmers/openassistant-guanaco --batch_size=4 --gradient_accumulation_steps 2 I get this error

File "~/.cache/huggingface/modules/transformers_modules/cerebras_btlm-3b-8k-base/modeling_btlm.py", line 942, in forward
    hidden_states *= torch.tensor(
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

I got the same error, I'm not sure how I resolved it.
Maybe try: model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=False)

Anyway, I couldn't make it work with TRL because of a parameter defined in TRL that I couldn't change (I don't remember which either), so I use the regular trainer and do the TRL things manually.

Anyway, I couldn't make it work with TRL because of a parameter defined in TRL that I couldn't change (I don't remember which either), so I use the regular trainer and do the TRL things manually.

This exciting, especially the long context window.
Could you please share your QLoRa fine tuning script or at least point to some resources one could use ? Thank you

I heavily based it on this: https://youtu.be/DcBC4yGHV4Q

I also recommend you to use deepspeed, for one GPU, you only have to add your deepspeed config file to the trainer args:

...
transformers.TrainingArguments(
        deepspeed = DEEPSPEED_CONFIG_FILE,
...

Plus launch it like this:
ACCELERATE_USE_DEEPSPEED=true PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 deepspeed your_finetune_script.py --deepspeed ds_config_3.json

With this I can train on my 3060 with 1000 token max context where it would otherwise OOM. I have a qlora training on some the gpt-4 part of the dolphin dataset, sadly I forgot to add the eos token, so it won't shut up. I don't care enough to re-run it since it was an experiment, and it would block my GPU for more than a month to do a few epochs. I also don't have money to throw on rent hardware, sadly. The dataset is really massive.

For multi GPU, I managed to get it to work but it took me too much trial and errors to explain, if you go that route and have an error I may be able to point you in the right direction.
edit: never mind, just got a RuntimeError: CUDA error: an illegal memory access was encountered. πŸ˜”

Sign up or log in to comment