Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ pipeline_tag: text-generation
|
|
22 |
This is a second RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [allenai/ultrafeedback_binarized_cleaned](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
|
23 |
|
24 |
The difference between this model and the "v1" model is that the v1 model used argilla's version of the dataset that was not decontaminated of TruthfulQA data.
|
25 |
-
DPOpenHermes is trained using LoRA.
|
26 |
|
27 |
# Training Details
|
28 |
|
|
|
22 |
This is a second RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [allenai/ultrafeedback_binarized_cleaned](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
|
23 |
|
24 |
The difference between this model and the "v1" model is that the v1 model used argilla's version of the dataset that was not decontaminated of TruthfulQA data.
|
25 |
+
DPOpenHermes is trained using 16-bit LoRA.
|
26 |
|
27 |
# Training Details
|
28 |
|