Edit model card

Model Card for Gemma 2B Zephyr DPO

We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k. We carefully selected the hyper-parameters to achieve the best DPO performance.

Model description

  • Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: Gemma Terms of Use
  • Finetuned from model: google/gemma-2b

License

This model has the same license as the original Gemma model collection

OpenLLM Leaderboard Performance

Models Avg. ARC HellaSwag MMLU TruthfulQA Winogrande GSM8k
google/gemma-2b 46.37 48.38 71.77 41.77 33.08 66.77 16.91
google/gemma-2b-it 42.75 43.94 62.70 37.65 45.82 60.93 5.46
wandb/gemma-2b-zephyr-sft 47.18 49.74 72.38 41.37 34.42 66.93 18.27
wandb/gemma-2b-zephyr-dpo 46.92 49.66 72.23 41.13 34.47 66.54 17.51
Columbia-NLP/gemma-2b-zephyr-sft 48.75 51.80 72.63 42.20 41.96 63.85 20.09
Columbia-NLP/gemma-2b-zephyr-dpo 49.14 52.22 73.11 42.55 42.64 64.40 19.94

MT-Bench

We evaluate our model with GPT-4-0125-preview as the judge.

Model Total Coding Extraction Humanities Math Reasoning Roleplay STEM Writing
google/gemma-2b-it 4.71 2.95 4.35 6.15 2.90 3.50 5.60 5.50 6.70
wandb/gemma-2b-zephyr-sft 4.03 3.10 3.15 5.00 2.70 2.65 5.10 4.80 5.75
wandb/gemma-2b-zephyr-dpo 4.06 2.80 2.90 5.55 2.65 2.70 5.20 4.80 5.85
anakin87_gemma-2b-orpo 4.14 3.00 3.70 6.30 2.70 2.35 5.68 4.75 4.75
Columbia-NLP/gemma-2b-zephyr-sft 4.34 3.10 3.70 6.25 2.65 2.70 5.55 5.25 5.50
Columbia-NLP/gemma-2b-zephyr-dpo 4.75 3.50 4.05 6.75 3.30 3.70 5.85 5.40 5.53
Downloads last month
1,626
Safetensors
Model size
2.51B params
Tensor type
BF16
·

Finetuned from

Dataset used to train Columbia-NLP/gemma-2b-zephyr-dpo

Evaluation results