G-reen
/

EXPERIMENT-DPO-m7b2-1-merged

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

G-reen commited on Mar 25

Commit

5d693b4

•

1 Parent(s): 8becdf3

Update README.md

Files changed (1) hide show

README.md +8 -16

README.md CHANGED Viewed

@@ -1,22 +1,14 @@
 This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
-Rank: 8
-Alpha: 16
-Learning rate: 5e-6
-Beta: 0.1
-Batch size: 8
 Epochs: 1
-Learning rate schedulers: Linear
-Prompt Format:
-```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)

 This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
+Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
+Rank: 8
+Alpha: 16
+Learning rate: 5e-6
+Beta: 0.1
+Batch size: 8
 Epochs: 1
+Learning rate scheduler: Linear
+Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)