Fine-tuning and DPO

#2
by agershun - opened

Could you share the thoughts about these questions:

  • It is possible to do the minor fine-tuning and training with DPO this network?
  • What packages is better to use?
  • How much GPU memory do I need for LoRA for this network? Is A100/40 is enough?

Thank you!

yes it will work great to finetune, I recommend using huggingface trl.

It trains fine, thank you!

I adapted the code from this article and then modified the prompts to the Starling-LM format.

Call Comply org

Nice work, I am glad you figured it out, let me know if you have any questions. Thanks for your support!

I finished wtih DPO. It also works fine with this model.

Sign up or log in to comment