G-reen commited on
Commit
5d693b4
1 Parent(s): 8becdf3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -16
README.md CHANGED
@@ -1,22 +1,14 @@
1
  This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
2
 
3
- Rank: 8
4
-
5
- Alpha: 16
6
-
7
- Learning rate: 5e-6
8
-
9
- Beta: 0.1
10
-
11
- Batch size: 8
12
-
13
  Epochs: 1
14
-
15
- Learning rate schedulers: Linear
16
-
17
- Prompt Format:
18
-
19
- ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
20
 
21
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
22
 
 
1
  This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
2
 
3
+ Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
4
+ Rank: 8
5
+ Alpha: 16
6
+ Learning rate: 5e-6
7
+ Beta: 0.1
8
+ Batch size: 8
 
 
 
 
9
  Epochs: 1
10
+ Learning rate scheduler: Linear
11
+ Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
 
 
 
 
12
 
13
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
14