winglian commited on
Commit
3ebea17
1 Parent(s): a09347d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ pipeline_tag: text-generation
22
  This is a second RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [allenai/ultrafeedback_binarized_cleaned](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
23
 
24
  The difference between this model and the "v1" model is that the v1 model used argilla's version of the dataset that was not decontaminated of TruthfulQA data.
25
- DPOpenHermes is trained using LoRA.
26
 
27
  # Training Details
28
 
 
22
  This is a second RL fine tuned model of [Teknium](https://huggingface.co/teknium)'s [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) using the [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) and [allenai/ultrafeedback_binarized_cleaned](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned) preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
23
 
24
  The difference between this model and the "v1" model is that the v1 model used argilla's version of the dataset that was not decontaminated of TruthfulQA data.
25
+ DPOpenHermes is trained using 16-bit LoRA.
26
 
27
  # Training Details
28