What is the best way for the inference process in LORA in PEFT approach

#70
by Pradeep1995 - opened

Here is the SFTtrainer method i used for finetuning mistral

trainer = SFTTrainer(
    model=peft_model,
    train_dataset=data,
    peft_config=peft_config,
    dataset_text_field=" column name",
    max_seq_length=3000,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()

I found different mechanisms for the finetuned model inference after PEFT based LORA finetuning

Method - 1

save adapter after completing training and then merge with base model then use for inference

trainer.model.save_pretrained("new_adapter_path")
from peft import PeftModel
finetuned_model = PeftModel.from_pretrained(base_model,
                                  new_adapter_path,
                                  torch_dtype=torch.float16,
                                  is_trainable=False,
                                  device_map="auto"
                                  )
finetuned_model = finetuned_model.merge_and_unload()

Method - 2

save checkpoints during training and then use the checkpoint with the least loss

from peft import PeftModel
finetuned_model = PeftModel.from_pretrained(base_model,
                                  "least loss checkpoint path",
                                  torch_dtype=torch.float16,
                                  is_trainable=False,
                                  device_map="auto"
                                  )
finetuned_model = finetuned_model.merge_and_unload()

Method - 3

same method with AutoPeftModelForCausalLM class

model = AutoPeftModelForCausalLM.from_pretrained(
    "output directory checkpoint path",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="cuda")
finetuned_model = finetuned_model.merge_and_unload()

Method-4

AutoPeftModelForCausalLM class specifies the output folder without specifying a specific checkpoint

instruction_tuned_model = AutoPeftModelForCausalLM.from_pretrained(
    training_args.output_dir,
    torch_dtype=torch.bfloat16,
    device_map = 'auto',
    trust_remote_code=True,
)
finetuned_model = finetuned_model.merge_and_unload()

Method-5
All the above methods without merging

#finetuned_model = finetuned_model.merge_and_unload()

Which is the actual method I should follow for inference?
and when to use which method over another?

I use "Method 1" and it works fine always. Better to save adapter checkpoints which are smaller in size and merge for once with base model rather than saving entire base model checkpoints everytime.

Btw can you share a sample notebook for finetuning ? I was using this - https://colab.research.google.com/drive/1VDa0lIfqiwm16hBlIlEaabGVTNB3dN1A?usp=sharing

But my training loss starts to increase after 1000 steps for some reason. Any ideas ? Running on custom dataset. Tried using both Alpaca & Mistral templates although that shouldn't matter much for finetuning i guess.

@sumegh try with a lower learning rate which will reduce the loss. Do you have any idea on to select the max_steps parameter?

That is if training a full epoch is not feasible for you. Else set num_train_epochs = 1. Otherwise, see total number of steps for single epoch based on batch size and then set max_steps < total steps for epoch.

Can you share your finetuning notebook for reference ?

Notebook sharing is not possible due to security reasons. It is confidential in my organization level

okay no issues. Also what optimizer are you using ? I was doing 4-bit LoRA finetuning. Using the paged_adamw_8bit optimizer from huggingface training config.

i am using - paged_adamw_32bit

@sumegh facing similar increasing loss issue during fine-tuning, were you able to resolve that using lower learning rate?

no @mlkorra lowering the learning rate made the model converge to a sub-optimal minima. It doesn't diverge anymore but the model doesn't learn much.
Let me know if you figure out something.

Sign up or log in to comment