Which is the actual way to store the adapters after PEFT finetuning

#67

by Pradeep1995 - opened Dec 28, 2023

Pradeep1995

Dec 28, 2023

I am finetuning the mistral model using the following configurations

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_strategy="steps",
    logging_steps=10,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=13000,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type
)
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=data,
    peft_config=peft_config,
    dataset_text_field=" column name",
    max_seq_length=3000,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()

during this training I am getting the multiple checkpoints in the specified output directory output_dir.

Once the model training is over I can save the model using

trainer.save_model()

Not only that i can save the final model using

trainer.model.save_pretrained("path")

So I bit confused. Which is the actual way to store the adapter after PEFT based lora fine-tuning

whether it is
1 - Take the least loss checkpoint folder from the output_dir
or
2 - save the adapter using

trainer.save_model()

or
3 - this method

trainer.model.save_pretrained("path")

ybelkada

Dec 28, 2023

Hi @Pradeep1995
Thanks for the issue, if you want the model with the least loss (i.e. the "best" model), I would advise to go for the first option. Otherwise 2- and 3- should achieve the same goal and save the final checkpoint. Note you can also call trainer.push_to_hub() and the trained adapters will be pushed on the Hub under your name space together with training logs

Pradeep1995

Dec 28, 2023

@ybelkada If I want the model with the least loss (i.e. the "best" model), I would go for the first option- the checkpoints.
So the checkpoints act as the adapters?

ybelkada

Dec 28, 2023

yes, if you inspect the checkpoint folders, you should be able to see adapter_model.safetensors and adapter_config.json files. Those refer to the adapter weights

Pradeep1995

Dec 29, 2023

•

edited Dec 29, 2023

@ybelkada Thanks. so if I use the least loss checkpoint folder as an adapter, then should I again merge that checkpoint folder with the base model using

merge_and_unload()

or shall i directly use the checkpoint folder for inference without merging?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment