G-reen commited on
Commit
bbc9f45
1 Parent(s): 5d693b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -1,15 +1,27 @@
1
- This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
 
 
2
 
3
  Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
 
4
  Rank: 8
 
5
  Alpha: 16
 
6
  Learning rate: 5e-6
 
7
  Beta: 0.1
 
8
  Batch size: 8
 
9
  Epochs: 1
 
10
  Learning rate scheduler: Linear
 
11
  Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
12
 
 
 
13
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
14
 
15
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)
 
1
+ *This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*
2
+
3
+ **Training Details**
4
 
5
  Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
6
+
7
  Rank: 8
8
+
9
  Alpha: 16
10
+
11
  Learning rate: 5e-6
12
+
13
  Beta: 0.1
14
+
15
  Batch size: 8
16
+
17
  Epochs: 1
18
+
19
  Learning rate scheduler: Linear
20
+
21
  Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
22
 
23
+
24
+ **WanDB Reports**
25
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
26
 
27
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)