G-reen
/

EXPERIMENT-DPO-m7b2-3-merged

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

EXPERIMENT-DPO-m7b2-3-merged / README.md

G-reen's picture

Update README.md

e011284 verified 7 months ago

|

1.3 kB

	---
	license: "apache-2.0"
	---

	This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.

	Note: This model failed to train because the LR was too high (stopped early at 300 steps). Do not use!

	Benchmarks

	Average 29.55

	ARC 29.52

	HellaSwag 25.9

	MMLU 23.12

	TruthfulQA 48.27

	WinograndeGSM8K 50.51

	GSM8K 0

	Training Details

	Duration: ~3 hours on one Kaggle T4 with Unsloth

	Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit

	Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k

	Rank: 8

	Alpha: 16

	Learning rate: 5e-4

	Beta: 0.1

	Batch size: 8

	Epochs: 1

	Learning rate scheduler: Linear

	Prompt Format: ChatML
	```
	<\|im_start\|>system
	You are a helpful assistant.<\|im_end\|>
	<\|im_start\|>user
	Why is the sky blue?<\|im_end\|>
	<\|im_start\|>assistant
	```


	WanDB Reports

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/hICdKQjk2mkRODyhyUfVV.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/CZobUkPWirguxOoE_hll0.png)

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)