---
license: other
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
datasets:
- argilla/dpo-mix-7k
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
base_model: Columbia-NLP/gemma-2b-zephyr-sft
model-index:
- name: gemma-2b-zephyr-dpo
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 52.22
      name: normalized accuracy
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 73.11
      name: normalized accuracy
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 42.55
      name: accuracy
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 42.64
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 64.4
      name: accuracy
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 19.94
      name: accuracy
---

# Model Card for Gemma 2B Zephyr DPO

We trained the [google/gemma-2b](https://huggingface.co/google/gemma-2b) with DPO and data from `argilla/dpo-mix-7k`.
We carefully selected the hyper-parameters to achieve the best DPO performance.

## Model description

- **Model type:** A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
- **Language(s) (NLP):** Primarily English
- **License:** Gemma Terms of Use
- **Finetuned from model:** [google/gemma-2b](https://huggingface.co/google/gemma-2b)


## License
This model has the same license as the [original Gemma model collection](https://ai.google.dev/gemma/terms)

## OpenLLM Leaderboard Performance

| Models                                  | Avg. | ARC   | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8k |
|-----------------------------------------|------|-------|-----------|------|------------|------------|-------|
| google/gemma-2b                         | 46.37| 48.38 | 71.77     | 41.77| 33.08      | 66.77      | 16.91 |
| google/gemma-2b-it                      | 42.75| 43.94 | 62.70     | 37.65| 45.82      | 60.93      | 5.46 |
| wandb/gemma-2b-zephyr-sft               | 47.18| 49.74 | 72.38     | 41.37| 34.42      | **66.93**      | 18.27 |
| wandb/gemma-2b-zephyr-dpo               | 46.92| 49.66 | 72.23     | 41.13| 34.47      | 66.54      | 17.51 |
| Columbia-NLP/gemma-2b-zephyr-sft      | 48.75| 51.80 | 72.63     | 42.20| 41.96      | 63.85      | **20.09** |
| **Columbia-NLP/gemma-2b-zephyr-dpo**        | **49.14**| **52.22** | **73.11**     | **42.55**| **42.64**      | 64.40      | 19.94 |


## MT-Bench

We evaluate our model with `GPT-4-0125-preview` as the judge.

| Model                                    | Total | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing |
|------------------------------------------|-------|--------|------------|------------|------|-----------|----------|------|---------|
| google/gemma-2b-it                       | 4.71  | 2.95   | **4.35**       | 6.15       | 2.90 | 3.50      | 5.60     | **5.50** | **6.70**    |
| wandb/gemma-2b-zephyr-sft                | 4.03  | 3.10   | 3.15       | 5.00       | 2.70 | 2.65      | 5.10     | 4.80 | 5.75    |
| wandb/gemma-2b-zephyr-dpo                | 4.06  | 2.80   | 2.90       | 5.55       | 2.65 | 2.70      | 5.20     | 4.80 | 5.85    |
| anakin87_gemma-2b-orpo                | 4.14  | 3.00   | 3.70      | 6.30      | 2.70 | 2.35      | 5.68    | 4.75 | 4.75    |
| Columbia-NLP/gemma-2b-zephyr-sft     | 4.34  | 3.10   | 3.70       | 6.25       | 2.65 | 2.70      | 5.55     | 5.25 | 5.50    |
| **Columbia-NLP/gemma-2b-zephyr-dpo**         | **4.75**  | **3.50**   | 4.05       | **6.75**       | **3.30** | **3.70**      | **5.85**     | 5.40 | 5.53    |