unsloth/mistral-7b-v0.2-bnb-4bit
Text Generation
•
Updated
•
312
•
13
Several trained models to compare the differences between each method. Each model has a complete description of hyperparams with wandb reports.
Note All training runs were done on this model (4 bit qlora). Go unsloth!
Note Used this entire dataset for training. For SFT, the rejected part of the dataset was ignored.
Note The image shows a comparison between all the completed DPO runs.
Note Probably the best loss curve at lr=5e-5.
Note Failed to train, definitely do not use
Note The image shows a comparison between all the completed SFT runs.
Note Probably the best loss curve at lr=5e-5.