--- license: other tags: - alignment-handbook - trl - dpo - generated_from_trainer datasets: - argilla/dpo-mix-7k license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms base_model: Columbia-NLP/gemma-2b-zephyr-sft model-index: - name: gemma-2b-zephyr-dpo results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 52.22 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 73.11 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 42.55 name: accuracy - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 42.64 - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 64.4 name: accuracy - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 19.94 name: accuracy --- # Model Card for Gemma 2B Zephyr DPO We trained the [google/gemma-2b](https://huggingface.co/google/gemma-2b) with DPO and data from `argilla/dpo-mix-7k`. We carefully selected the hyper-parameters to achieve the best DPO performance. ## Model description - **Model type:** A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets. - **Language(s) (NLP):** Primarily English - **License:** Gemma Terms of Use - **Finetuned from model:** [google/gemma-2b](https://huggingface.co/google/gemma-2b) ## License This model has the same license as the [original Gemma model collection](https://ai.google.dev/gemma/terms) ## OpenLLM Leaderboard Performance | Models | Avg. | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8k | |-----------------------------------------|------|-------|-----------|------|------------|------------|-------| | google/gemma-2b | 46.37| 48.38 | 71.77 | 41.77| 33.08 | 66.77 | 16.91 | | google/gemma-2b-it | 42.75| 43.94 | 62.70 | 37.65| 45.82 | 60.93 | 5.46 | | wandb/gemma-2b-zephyr-sft | 47.18| 49.74 | 72.38 | 41.37| 34.42 | **66.93** | 18.27 | | wandb/gemma-2b-zephyr-dpo | 46.92| 49.66 | 72.23 | 41.13| 34.47 | 66.54 | 17.51 | | Columbia-NLP/gemma-2b-zephyr-sft | 48.75| 51.80 | 72.63 | 42.20| 41.96 | 63.85 | **20.09** | | **Columbia-NLP/gemma-2b-zephyr-dpo** | **49.14**| **52.22** | **73.11** | **42.55**| **42.64** | 64.40 | 19.94 | ## MT-Bench We evaluate our model with `GPT-4-0125-preview` as the judge. | Model | Total | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing | |------------------------------------------|-------|--------|------------|------------|------|-----------|----------|------|---------| | google/gemma-2b-it | 4.71 | 2.95 | **4.35** | 6.15 | 2.90 | 3.50 | 5.60 | **5.50** | **6.70** | | wandb/gemma-2b-zephyr-sft | 4.03 | 3.10 | 3.15 | 5.00 | 2.70 | 2.65 | 5.10 | 4.80 | 5.75 | | wandb/gemma-2b-zephyr-dpo | 4.06 | 2.80 | 2.90 | 5.55 | 2.65 | 2.70 | 5.20 | 4.80 | 5.85 | | anakin87_gemma-2b-orpo | 4.14 | 3.00 | 3.70 | 6.30 | 2.70 | 2.35 | 5.68 | 4.75 | 4.75 | | Columbia-NLP/gemma-2b-zephyr-sft | 4.34 | 3.10 | 3.70 | 6.25 | 2.65 | 2.70 | 5.55 | 5.25 | 5.50 | | **Columbia-NLP/gemma-2b-zephyr-dpo** | **4.75** | **3.50** | 4.05 | **6.75** | **3.30** | **3.70** | **5.85** | 5.40 | 5.53 |