G-reen
/

EXPERIMENT-DPO-m7b2-4-merged

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

G-reen commited on Apr 1, 2024

Commit

859f70d

•

1 Parent(s): 8df6c20

Update README.md

Files changed (1) hide show

README.md +47 -17

README.md CHANGED Viewed

@@ -1,23 +1,53 @@
 ---
-language:
-- en
-license: apache-2.0
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- mistral
-- trl
-- dpo
-base_model: unsloth/mistral-7b-v0.2-bnb-4bit
 ---
-# Uploaded  model
-- **Developed by:** G-reen
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/mistral-7b-v0.2-bnb-4bit
-This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+license: "apache-2.0"
 ---
+*This model was trained as part of a series of experiments testing the performance of pure DPO vs SFT vs ORPO, all supported by Unsloth/Huggingface TRL.*
+Note: Extremely buggy, not recommended for use. However, it didn't massively overfit like #3, so it could be usable still.
+The training was somewhat unstable, so the optimal bound for LR seems to be around [1e-5, 1e-4].
+**Benchmarks**
+TBA
+**Training Details**
+Duration: ~10-12 hours on one Kaggle T4 with Unsloth
+Model: https://huggingface.co/unsloth/mistral-7b-v0.2-bnb-4bit
+Dataset: https://huggingface.co/datasets/argilla/dpo-mix-7k
+Rank: 8
+Alpha: 16
+Learning rate: 1e-4
+Beta: 0.1
+Batch size: 8
+Epochs: 1
+Learning rate scheduler: Linear
+Prompt Format: ChatML
+```
+<|im_start|>system
+You are a helpful assistant.<|im_end|>
+<|im_start|>user
+Why is the sky blue?<|im_end|>
+<|im_start|>assistant
+```
+**WanDB Reports**
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/Ww-urn-b22jj2sr735rNs.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/hVzS-W9SGA8TZn65ixF84.png)
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)