G-reen 's Collections

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

Several trained models to compare the differences between each method. Each model has a complete description of hyperparams with wandb reports.