--- license: other license_name: gemma-terms-of-use license_link: https://ai.google.dev/gemma/terms base_model: google/gemma-2b tags: - alignment-handbook - trl - sft - generated_from_trainer datasets: - HuggingFaceH4/deita-10k-v0-sft model-index: - name: gemma-2b-zephyr-sft results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 51.88 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 72.63 name: normalized accuracy - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 42.20 name: accuracy - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 41.96 - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 63.85 name: accuracy - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 20.09 name: accuracy --- # Model Card for Gemma 2B Zephyr SFT We trained the [google/gemma-2b](https://huggingface.co/google/gemma-2b) with [deita-10k-v0-sft](https://huggingface.co/datasets/HuggingFaceH4/deita-10k-v0-sft). We carefully selected the hyper-parameters and masked the user tokens during training to achieve the best supervised fine-tuning performance. ## Model description - **Model type:** A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets. - **Language(s) (NLP):** Primarily English - **License:** Gemma Terms of Use - **Finetuned from model:** [google/gemma-2b](https://huggingface.co/google/gemma-2b) ## License This model has the same license as the [original Gemma model collection](https://ai.google.dev/gemma/terms) ## OpenLLM Leaderboard Performance | Models | Avg. | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8k | |-----------------------------------------|------|-------|-----------|------|------------|------------|-------| | google/gemma-2b | 46.37| 48.38 | 71.77 | 41.77| 33.08 | 66.77 | 16.91 | | google/gemma-2b-it | 42.75| 43.94 | 62.70 | 37.65| 45.82 | 60.93 | 5.46 | | wandb/gemma-2b-zephyr-sft | 47.18| 49.74 | 72.38 | 41.37| 34.42 | 66.93 | 18.27 | | wandb/gemma-2b-zephyr-dpo | 46.92| 49.66 | 72.23 | 41.13| 34.47 | 66.54 | 17.51 | | **Columbia-NLP/gemma-2b-zephyr-sft** | 48.75| 51.80 | 72.63 | 42.20| 41.96 | 63.85 | 20.09 | | Columbia-NLP/gemma-2b-zephyr-dpo | 49.14| 52.22 | 73.11 | 42.55| 42.64 | 64.40 | 19.94 | ## MT-Bench GPT-4-0125-preview as Judge | Model | Total | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | STEM | Writing | |------------------------------------------|-------|--------|------------|------------|------|-----------|----------|------|---------| | google/gemma-2b-it | 4.71 | 2.95 | 4.35 | 6.15 | 2.90 | 3.50 | 5.60 | 5.50 | 6.70 | | wandb/gemma-2b-zephyr-sft | 4.03 | 3.10 | 3.15 | 5.00 | 2.70 | 2.65 | 5.10 | 4.80 | 5.75 | | wandb/gemma-2b-zephyr-dpo | 4.06 | 2.80 | 2.90 | 5.55 | 2.65 | 2.70 | 5.20 | 4.80 | 5.85 | | **Columbia-NLP/gemma-2b-zephyr-sft** | 4.34 | 3.10 | 3.70 | 6.25 | 2.65 | 2.70 | 5.55 | 5.25 | 5.50 | | Columbia-NLP/gemma-2b-zephyr-dpo | 4.75 | 3.50 | 4.05 | 6.75 | 3.30 | 3.70 | 5.85 | 5.40 | 5.53 |