openaccess-ai-collective
/

DPOpenHermes-7B-v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

winglian commited on Dec 6, 2023

Commit

3ebea17

•

1 Parent(s): a09347d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ pipeline_tag: text-generation
 This is a second RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [allenai/ultrafeedback_binarized_cleaned](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
 The difference between this model and the "v1" model is that the v1 model used argilla's version of the dataset that was not decontaminated of TruthfulQA data.
-DPOpenHermes is trained using LoRA.
 # Training Details

 This is a second RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [allenai/ultrafeedback_binarized_cleaned](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
 The difference between this model and the "v1" model is that the v1 model used argilla's version of the dataset that was not decontaminated of TruthfulQA data.
+DPOpenHermes is trained using 16-bit LoRA.
 # Training Details