winglian commited on
Commit
c27e00c
1 Parent(s): 2d77846

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -2
README.md CHANGED
@@ -3,10 +3,19 @@ library_name: peft
3
  base_model: teknium/OpenHermes-2.5-Mistral-7B
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
 
 
 
 
 
 
10
 
11
 
12
  ## Model Details
 
3
  base_model: teknium/OpenHermes-2.5-Mistral-7B
4
  ---
5
 
6
+ # DPOpenHermes 7B
7
 
8
+ ## OpenHermes x Notus x Neural
9
 
10
+ This is an RL fine tuned OpenHermes using the Intel/orca_dpo_pairs and argilla/ultrafeedback-binarized-preferences preference datasets for reinforcement learning using Direct Preference Optimization (DPO)
11
+
12
+ DPOpenHermes is trained using qLoRA. The adapter is also provided in this model repo.
13
+
14
+ # Training Details
15
+
16
+ DPOpenHermes was trained on a single H100 80GB hosted on RunPod for ~10h for 0.6 epochs of the dataset.
17
+
18
+ https://wandb.ai/oaaic/openhermes-dpo/reports/DPOpenHermes--Vmlldzo2MTQ3NDg2
19
 
20
 
21
  ## Model Details