File size: 1,237 Bytes

f6b3b80
 
 
 
bbc9f45
 
342f4da
e7c2c4c
79d8e59
 
ac33a25
9676f37
ac33a25
9676f37
ac33a25
9676f37
ac33a25
9676f37
ac33a25
9676f37
f1f55b9
9676f37
ac33a25
79d8e59
bbc9f45
8fa567d
827a381
 
26b3272
 
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5d693b4
bbc9f45
5bbf297
bbc9f45
5d693b4
bbc9f45
c67b1a3
8fa567d
bbc9f45
 
5bbf297
8fa567d
5bbf297
8fa567d
f6b3b80

---
license: "apache-2.0"
---

*This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*

Note: Completely broken. Do not use.

**Benchmarks**

Average 59.52

ARC 59.47

HellaSwag 82.42

MMLU 62.21

TruthfulQA 40.01

Winogrande 78.3

GSM8K 34.72

**Training Details**

Duration: ~10-12 hours on one Kaggle T4 with Unsloth

Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit

Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k 

Rank: 8 

Alpha: 16 

Learning rate: 5e-6 

Beta: 0.1 

Batch size: 8 

Epochs: 1

Learning rate scheduler: Linear

Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>``` (The start token \<s\> must be added manually and not automatically)


**WanDB Reports**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)