DPO Version

#5
by KnutJaegersberg - opened

How would this model behave if one would do the UltraLM DPO training?

Sign up or log in to comment