G-reen commited on
Commit
859f70d
1 Parent(s): 8df6c20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -17
README.md CHANGED
@@ -1,23 +1,53 @@
1
  ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- tags:
6
- - text-generation-inference
7
- - transformers
8
- - unsloth
9
- - mistral
10
- - trl
11
- - dpo
12
- base_model: unsloth/mistral-7b-v0.2-bnb-4bit
13
  ---
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** G-reen
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/mistral-7b-v0.2-bnb-4bit
20
 
21
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: "apache-2.0"
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
+ *This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*
6
 
7
+ Note: Extremely buggy, not recommended for use. However, it didn't massively overfit like #3, so it could be usable still.
 
 
8
 
9
+ The training was somewhat unstable, so the optimal bound for LR seems to be around [1e-5, 1e-4].
10
 
11
+ **Benchmarks**
12
+
13
+ TBA
14
+
15
+ **Training Details**
16
+
17
+ Duration: ~10-12 hours on one Kaggle T4 with Unsloth
18
+
19
+ Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit
20
+
21
+ Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
22
+
23
+ Rank: 8
24
+
25
+ Alpha: 16
26
+
27
+ Learning rate: 1e-4
28
+
29
+ Beta: 0.1
30
+
31
+ Batch size: 8
32
+
33
+ Epochs: 1
34
+
35
+ Learning rate scheduler: Linear
36
+
37
+ Prompt Format: ChatML
38
+ ```
39
+ <|im_start|>system
40
+ You are a helpful assistant.<|im_end|>
41
+ <|im_start|>user
42
+ Why is the sky blue?<|im_end|>
43
+ <|im_start|>assistant
44
+ ```
45
+
46
+
47
+ **WanDB Reports**
48
+
49
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Ww-urn-b22jj2sr735rNs.png)
50
+
51
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/hVzS-W9SGA8TZn65ixF84.png)
52
+
53
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)