ArianAskari commited on
Commit
1c42183
1 Parent(s): 40d42d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -1,3 +1,11 @@
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  language:
 
1
+ A variation of NeuralHermes 2.5 - Mistral 7B
2
+
3
+ This is a variation of NeuralHermes which is based on the teknium/OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the mlabonne/chatml_dpo_pairs dataset. It surpasses the original model on most benchmarks (see results).
4
+
5
+ It is directly inspired by the RLHF process described by Intel/neural-chat-7b-v3-1's authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
6
+
7
+ The code to train this model is available on Google Colab and GitHub. It required an A100 GPU for about an hour.
8
+
9
  ---
10
  license: mit
11
  language: