update license

e454528 verified 10 months ago

3.67 kB

	---
	license: cc-by-nc-4.0
	base_model: mlabonne/Marcoro14-7B-slerp
	datasets:
	- argilla/distilabel-intel-orca-dpo-pairs
	language:
	- en
	tags:
	- distilabel
	- dpo
	- rlaif
	- rlhf
	- merge
	- mergekit
	---
	# ⚗️ distilabeled Marcoro14 7B Slerp


	<p align="center">
	<a href="https://github.com/argilla-io/distilabel">
	<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
	</a>
	</p>


	## Introduction

	This model is a new DPO fine-tune of our new open dataset [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs), on the [mlabonne/Marcoro14-7B-slerp](https://huggingface.co/mlabonne/Marcoro14-7B-slerp) model. You can find more information of the "distilabeled" dataset used at this repo [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B/blob/main/README.md#introduction), and visit [distilabel](https://github.com/argilla-io/distilabel).

	## Training details

	As we did with [Notus](https://argilla.io/blog/notus7b/), we wanted a reproducible recipe to test the impact of data quality.

	And we're lucky to have so many amazing folks in the open community contributing reproducible, easy-to-use training scripts and recipes. This time, [Maxime Labonne](https://twitter.com/maximelabonne) had shared a [Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) to fine-tune OpenHermes with DPO and the original Intel's dataset, perfect! We just updated the base model to [mlabonne/Marcoro14-7B-slerp](https://huggingface.co/mlabonne/Marcoro14-7B-slerp), and applied the same dataset recipe we used for [argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B/blob/main/README.md#introduction):

	```python
	from datasets import load_dataset

	# Instead of this:
	# dataset = load_dataset("Intel/orca_dpo_pairs", split="train")

	# we did this
	dataset = load_dataset("argilla/distilabel-intel-orca-dpo-pairs", split="train")

	dataset = dataset.filter(
	lambda r:
	r["status"] != "tie" and
	r["chosen_score"] >= 8 and
	not r["in_gsm8k_train"]
	)
	```

	## Benchmark results
	For benchmarking we used the famous "Nous" or "Teknium" benchmark. You can find below an overview, including our first experiment with a less ambitious dataset filtering (removing ties and `score>5`).

	For running the benchmark we used another awesome contribution from Maxime: [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), check it out!

	\| Model \|AGIEval\|GPT4ALL\|TruthfulQA\|Bigbench\|Average\|
	\|-------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[argilla/distilabeled-Marcoro14-7B-slerp](https://huggingface.co/argilla/distilabeled-Marcoro14-7B-slerp)\| 45.4\| 76.47\| 65.46\| 47.19\| 58.63\|
	\|[Marcoro14-7B-slerp](https://huggingface.co/mlabonne/Marcoro14-7B-slerp) \| 44.66\| 76.24\| 64.15\| 45.64\| 57.67\|
	\|[argilla/distilabeled-Hermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-Hermes-2.5-Mistral-7B) \| 44.64 \| 73.35 \| 55.96 \| 42.21 \| 54.04 \|

	### Training Hardware

	We used 1 x A100 80GB in runpod for less than 1 hour.

	## Acknowledgements

	We'd like to thank the amazing open community and in particular:

	* The Intel team for publishing a great open dataset and show how well it worked in the first place
	* Teknium and NousResearch for their awesome work and models.
	* Maxime for sharing such great resources.