oopere
/

martra-open-gemma-2b-it-dpo

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

martra-open-gemma-2b-it-dpo / README.md

oopere's picture

Upload GemmaForCausalLM

ec47410 verified 6 months ago

|

history blame contribute delete

1.98 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- dpo
	- 2b
	- gemma
	datasets:
	- argilla/distilabel-capybara-dpo-7k-binarized
	pipeline_tag: text-generation
	widget:
	- text: "3713841893836/4? \nLimit your response to mathematical expressions and symbols."
	example_title: 'Return only numbers. '
	- text: A group of 10 people is split into 3 different committees of 3, 4, and 3 people,
	respectively. In how many ways can this be done?
	example_title: Solve Problem
	---

	You can see the process with instructions for creating the model in the notebook: [Aligning_DPO_gemma-2b.ipynb](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/P2-MHF/Aligning_DPO_open_gemma-2b-it.ipynb)

	To create it, we started with the [google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it) model and applied DPO alignment using the [distilabel-capybara-dpo-7k-binarized dataset](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized).

	Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

	Row in the Dataset:

	*{'prompt': '<\|user\|>\nAssist me in calculating 9319357631 plus 595. Numbers and symbols only, please.<\|end\|>\n<\|assistant\|>\n',
	'chosen': 'The sum of 9319357631 and 595 is 9319358226.<\|end\|>\n',
	'rejected': 'The result of adding 9319357631 and 595 is 9319363626.<\|end\|>\n'}*

	Prompt:
	*3713841893836/4?
	Limit your response to mathematical expressions and symbols.*

	Response from the Base model:

	*To find the result of the division, we can simply divide the given number by 4:
	$$
	\frac{3713841893836}{4} = 928460473459*

	Response from the fine-tuned model:

	3713841893836 ÷ 4 = 928460473459


	If you want to see how the model was created, you can check out the [repository](https://github.com/peremartra/Large-Language-Model-Notebooks-Course) where the book's notebooks are kept up-to-date.