Edit model card

Model Card for Gemma 2B Zephyr DPO

We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k. We carefully selected the hyper-parameters to achieve the best DPO performance.

Model description

  • Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: Gemma Terms of Use
  • Finetuned from model: google/gemma-2b

License

This model has the same license as the original Gemma model collection

OpenLLM Leaderboard Performance

Models Avg. ARC HellaSwag MMLU TruthfulQA Winogrande GSM8k
google/gemma-2b 46.37 48.38 71.77 41.77 33.08 66.77 16.91
google/gemma-2b-it 42.75 43.94 62.70 37.65 45.82 60.93 5.46
wandb/gemma-2b-zephyr-sft 47.18 49.74 72.38 41.37 34.42 66.93 18.27
wandb/gemma-2b-zephyr-dpo 46.92 49.66 72.23 41.13 34.47 66.54 17.51
Columbia-NLP/gemma-2b-zephyr-sft 48.75 51.80 72.63 42.20 41.96 63.85 20.09
Columbia-NLP/gemma-2b-zephyr-dpo 49.14 52.22 73.11 42.55 42.64 64.40 19.94

MT-Bench

We evaluate our model with GPT-4-0125-preview as the judge.

Model Total Coding Extraction Humanities Math Reasoning Roleplay STEM Writing
google/gemma-2b-it 4.71 2.95 4.35 6.15 2.90 3.50 5.60 5.50 6.70
wandb/gemma-2b-zephyr-sft 4.03 3.10 3.15 5.00 2.70 2.65 5.10 4.80 5.75
wandb/gemma-2b-zephyr-dpo 4.06 2.80 2.90 5.55 2.65 2.70 5.20 4.80 5.85
anakin87_gemma-2b-orpo 4.14 3.00 3.70 6.30 2.70 2.35 5.68 4.75 4.75
Columbia-NLP/gemma-2b-zephyr-sft 4.34 3.10 3.70 6.25 2.65 2.70 5.55 5.25 5.50
Columbia-NLP/gemma-2b-zephyr-dpo 4.75 3.50 4.05 6.75 3.30 3.70 5.85 5.40 5.53
Downloads last month
27
Safetensors
Model size
2.51B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train Columbia-NLP/gemma-2b-zephyr-dpo

Evaluation results