You can see the process with instructions for creating the model in the notebook: Aligning_DPO_gemma-2b.ipynb

To create it, we started with the google/gemma-2b-it model and applied DPO alignment using the distilabel-capybara-dpo-7k-binarized dataset.

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Row in the Dataset:

{'prompt': '<|user|>\nAssist me in calculating 9319357631 plus 595. Numbers and symbols only, please.<|end|>\n<|assistant|>\n', 'chosen': 'The sum of 9319357631 and 595 is 9319358226.<|end|>\n', 'rejected': 'The result of adding 9319357631 and 595 is 9319363626.<|end|>\n'}

Prompt: 3713841893836/4? Limit your response to mathematical expressions and symbols.

Response from the Base model:

To find the result of the division, we can simply divide the given number by 4: $$ \frac{3713841893836}{4} = 928460473459

Response from the fine-tuned model:

3713841893836 ÷ 4 = 928460473459

If you want to see how the model was created, you can check out the repository where the book's notebooks are kept up-to-date.

Downloads last month
20
Safetensors
Model size
2.51B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for oopere/martra-open-gemma-2b-it-dpo

Finetunes
1 model
Quantizations
1 model

Dataset used to train oopere/martra-open-gemma-2b-it-dpo