G-reen commited on
Commit
8becdf3
1 Parent(s): 2a3c3d8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -1,13 +1,21 @@
1
  This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
2
 
3
  Rank: 8
 
4
  Alpha: 16
 
5
  Learning rate: 5e-6
 
6
  Beta: 0.1
 
7
  Batch size: 8
 
8
  Epochs: 1
 
9
  Learning rate schedulers: Linear
 
10
  Prompt Format:
 
11
  ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
12
 
13
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
 
1
  This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
2
 
3
  Rank: 8
4
+
5
  Alpha: 16
6
+
7
  Learning rate: 5e-6
8
+
9
  Beta: 0.1
10
+
11
  Batch size: 8
12
+
13
  Epochs: 1
14
+
15
  Learning rate schedulers: Linear
16
+
17
  Prompt Format:
18
+
19
  ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
20
 
21
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)