Training/finetuning code?

#5
by milsunone - opened

Can you share finetuning code using DPO used, or an ETA on when the code will be available?

Hugging Face H4 org

Hello @milsunone we'll be releasing the DPO training code soon in the Alignment Handbook we're working on: https://github.com/huggingface/alignment-handbook

In the meantime, you can adapt the script from TRL which is quite similar to what we'll release: https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py

Hello @milsunone we'll be releasing the DPO training code soon in the Alignment Handbook we're working on: https://github.com/huggingface/alignment-handbook

In the meantime, you can adapt the script from TRL which is quite similar to what we'll release: https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py

Great, could you also share what datasets are used during fine-tuning? It will be a great reference for fine-tune learning :)

How did you set the beta in DPO?

Sign up or log in to comment