wandb
/

gemma-7b-zephyr-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Gemma 7B Zephyr DPO

The Zephyr DPO recipe applied on top of SFT finetuned Gemma 7B

Model description

Model type: A 8.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
Finetuned from model: wandb/gemma-7b-zephyr-sft

Recipe

We trained using the DPO script in alignment handbook recipe and logging to W&B

Visit the W&B workspace here

License

This model has the same license as the original Gemma model collection

Compute provided by Lambda Labs - 8xA100 80GB node

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	61.62
AI2 Reasoning Challenge (25-Shot)	60.84
HellaSwag (10-Shot)	80.44
MMLU (5-Shot)	60.60
TruthfulQA (0-shot)	42.48
Winogrande (5-shot)	75.37
GSM8k (5-shot)	49.96

Downloads last month: 35

Safetensors

Model size

8.54B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for wandb/gemma-7b-zephyr-dpo

Base model

google/gemma-7b

Finetuned

wandb/gemma-7b-zephyr-sft

Finetuned

(1)

this model

Dataset used to train wandb/gemma-7b-zephyr-dpo

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

60.840
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

80.440
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

60.600
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

42.480
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

75.370
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

49.960

View on Papers With Code