Update README.md
Browse files
README.md
CHANGED
@@ -1,23 +1,53 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
- en
|
4 |
-
license: apache-2.0
|
5 |
-
tags:
|
6 |
-
- text-generation-inference
|
7 |
-
- transformers
|
8 |
-
- unsloth
|
9 |
-
- mistral
|
10 |
-
- trl
|
11 |
-
- dpo
|
12 |
-
base_model: unsloth/mistral-7b-v0.2-bnb-4bit
|
13 |
---
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
-
- **License:** apache-2.0
|
19 |
-
- **Finetuned from model :** unsloth/mistral-7b-v0.2-bnb-4bit
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: "apache-2.0"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
+
*This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*
|
6 |
|
7 |
+
Note: Extremely buggy, not recommended for use. However, it didn't massively overfit like #3, so it could be usable still.
|
|
|
|
|
8 |
|
9 |
+
The training was somewhat unstable, so the optimal bound for LR seems to be around [1e-5, 1e-4].
|
10 |
|
11 |
+
**Benchmarks**
|
12 |
+
|
13 |
+
TBA
|
14 |
+
|
15 |
+
**Training Details**
|
16 |
+
|
17 |
+
Duration: ~10-12 hours on one Kaggle T4 with Unsloth
|
18 |
+
|
19 |
+
Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit
|
20 |
+
|
21 |
+
Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
|
22 |
+
|
23 |
+
Rank: 8
|
24 |
+
|
25 |
+
Alpha: 16
|
26 |
+
|
27 |
+
Learning rate: 1e-4
|
28 |
+
|
29 |
+
Beta: 0.1
|
30 |
+
|
31 |
+
Batch size: 8
|
32 |
+
|
33 |
+
Epochs: 1
|
34 |
+
|
35 |
+
Learning rate scheduler: Linear
|
36 |
+
|
37 |
+
Prompt Format: ChatML
|
38 |
+
```
|
39 |
+
<|im_start|>system
|
40 |
+
You are a helpful assistant.<|im_end|>
|
41 |
+
<|im_start|>user
|
42 |
+
Why is the sky blue?<|im_end|>
|
43 |
+
<|im_start|>assistant
|
44 |
+
```
|
45 |
+
|
46 |
+
|
47 |
+
**WanDB Reports**
|
48 |
+
|
49 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Ww-urn-b22jj2sr735rNs.png)
|
50 |
+
|
51 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/hVzS-W9SGA8TZn65ixF84.png)
|
52 |
+
|
53 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|