G-reen
/

EXPERIMENT-DPO-m7b2-1-merged

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

G-reen commited on Mar 25

Commit

bbc9f45

•

1 Parent(s): 5d693b4

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -1,15 +1,27 @@
-This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
 Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
 Rank: 8
 Alpha: 16
 Learning rate: 5e-6
 Beta: 0.1
 Batch size: 8
 Epochs: 1
 Learning rate scheduler: Linear
 Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)

+*This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*
+**Training Details**
 Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
 Rank: 8
 Alpha: 16
 Learning rate: 5e-6
 Beta: 0.1
 Batch size: 8
 Epochs: 1
 Learning rate scheduler: Linear
 Prompt Format: ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
+**WanDB Reports**
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/8DQ0WiypkVIJeK_Y18Wv0.png)