Commit
•
f818073
1
Parent(s):
62c23f5
Update README.md
Browse files
README.md
CHANGED
@@ -172,7 +172,7 @@ model-index:
|
|
172 |
|
173 |
# Model Card for Notus 7B v1
|
174 |
|
175 |
-
Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is the first version, fine-tuned with DPO
|
176 |
|
177 |
Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO.
|
178 |
|
|
|
172 |
|
173 |
# Model Card for Notus 7B v1
|
174 |
|
175 |
+
Notus is a collection of fine-tuned models using Direct Preference Optimization (DPO) and related RLHF techniques. This model is the first version, fine-tuned with DPO over `zephyr-7b-sft-full`, which is the SFT model produced to create `zephyr-7b-beta`.
|
176 |
|
177 |
Following a **data-first** approach, the only difference between Notus-7B-v1 and Zephyr-7B-beta is the preference dataset used for dDPO.
|
178 |
|