ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k - a G-reen Collection

G-reen 's Collections

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

ORPO v DPO v SFT + Training Loss Curves; argilla/dpo-mix-7k

updated Jul 2, 2024

Several trained models to compare the differences between each method. Each model has a complete description of hyperparams with wandb reports.

unsloth/mistral-7b-v0.2-bnb-4bit

Text Generation • Updated Sep 11, 2024 • 2.27k • 14

Note All training runs were done on this model (4 bit qlora). Go unsloth!
argilla/dpo-mix-7k

Viewer • Updated Jul 16, 2024 • 7.5k • 600 • 162

Note Used this entire dataset for training. For SFT, the rejected part of the dataset was ignored.
G-reen/EXPERIMENT-DPO-m7b2-1-merged

Text Generation • Updated Apr 15, 2024 • 58
Note The image shows a comparison between all the completed DPO runs.
G-reen/EXPERIMENT-DPO-m7b2-2-merged

Text Generation • Updated Apr 15, 2024 • 53

Note Probably the best loss curve at lr=5e-5.
G-reen/EXPERIMENT-DPO-m7b2-3-merged

Text Generation • Updated Apr 15, 2024 • 54

Note Failed to train, definitely do not use
G-reen/EXPERIMENT-DPO-m7b2-4-merged

Text Generation • Updated Apr 5, 2024 • 43
G-reen/EXPERIMENT-SFT-m7b2-1-merged

Text Generation • Updated Apr 15, 2024 • 15
Note The image shows a comparison between all the completed SFT runs.
G-reen/EXPERIMENT-SFT-m7b2-2-merged

Text Generation • Updated Apr 15, 2024 • 51

Note Probably the best loss curve at lr=5e-5.
G-reen/EXPERIMENT-SFT-m7b2-3-merged

Text Generation • Updated Apr 15, 2024 • 12
G-reen/EXPERIMENT-ORPO-m7b2-1-merged

Text Generation • Updated Apr 16, 2024 • 12
G-reen/EXPERIMENT-ORPO-m7b2-2-merged

Text Generation • Updated Apr 19, 2024 • 10