DPO Dataset Qs

#2
by RonanMcGovern - opened

Thanks for making this model!

Three Qs:

  • I understand teknium/OpenHermes-2.5 was used to originally do SFT on Mistral to get the base SFT model. But what dataset was used for the DPO itself?
  • I notice that teknium/OpenHermes-2.5 has "from" instead of "role" and "value" instead of "content"... so I suppose you swapped those when formatting the chat data for the DPO (and SFT) steps?
  • How did you decide what hyperparams to use for the DPO?

Yes, I have the same doubts.

I'd also love to know about the datasets used for DPO in case we can help improving them!

Sign up or log in to comment