weqweasdas commited on
Commit
35700d2
1 Parent(s): 60ccec5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,7 +8,7 @@
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
- In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
12
 
13
  ## Model Details
14
 
@@ -16,7 +16,7 @@ In this repo, we present a reward model trained by the framework [LMFlow](https:
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
- The HH-RLHF dataset contains 112K comparison samples in the training set and 12.5K comparison samples in the test set. We first replace the ``\n\nHuman'' and ``\n\nAssistant'' in the dataset by ``###Human'' and ``###Assistant'', respectively.
20
 
21
  Then, we split the dataset as follows:
22
 
 
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
+ In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
12
 
13
  ## Model Details
14
 
 
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
+ The HH-RLHF-Helpful dataset contains 112K comparison samples in the training set and 12.5K comparison samples in the test set. We first replace the ``\n\nHuman'' and ``\n\nAssistant'' in the dataset by ``###Human'' and ``###Assistant'', respectively.
20
 
21
  Then, we split the dataset as follows:
22