Edit model card

MistralTrinity-7B-slerp-dpo

Inspired by @mlabonne's blog post Fine-tune a Mistral-7b model with Direct Preference Optimization, this model was fine-tuned with DPO (Direct Preference Optimization) on base model MistralTrinity-7B-slerp, which is a merged model for mistralai/Mistral-7B-Instruct-v0.2 and jan-hq/trinity-v1, using the mlabonne/chatml_dpo_pairs dataset.

The code to train this model is available on Google Colab and GitHub.

It required an A100 GPU for over an hour.

Check out fine-tuning run details on Weights & Biases.

Downloads last month
3
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from

Dataset used to train wenqiglantz/MistralTrinity-7B-slerp-dpo