G-reen
/

EXPERIMENT-DPO-m7b2-1-merged

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

G-reen commited on Mar 25

Commit

8becdf3

•

1 Parent(s): 2a3c3d8

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -1,13 +1,21 @@
 This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
 Rank: 8
 Alpha: 16
 Learning rate: 5e-6
 Beta: 0.1
 Batch size: 8
 Epochs: 1
 Learning rate schedulers: Linear
 Prompt Format:
 ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)

 This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.
 Rank: 8
 Alpha: 16
 Learning rate: 5e-6
 Beta: 0.1
 Batch size: 8
 Epochs: 1
 Learning rate schedulers: Linear
 Prompt Format:
 ```You are a helpful assistant.<s>[INST] PROMPT [/INST]RESPONSE</s>```
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Tg3dknWsTvfqM96Fab2YJ.png)