weqweasdas
/

hh_rlhf_rm_open_llama_3b

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Nov 2, 2023

Commit

35700d2

•

1 Parent(s): 60ccec5

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@
 <!-- Provide a quick summary of what the model is/does. -->
-In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
 ## Model Details
@@ -16,7 +16,7 @@ In this repo, we present a reward model trained by the framework [LMFlow](https:
 <!-- Provide a longer summary of what this model is. -->
-The HH-RLHF dataset contains 112K comparison samples in the training set and 12.5K comparison samples in the test set. We first replace the ``\n\nHuman'' and ``\n\nAssistant'' in the dataset by ``###Human'' and ``###Assistant'', respectively.
 Then, we split the dataset as follows:

 <!-- Provide a quick summary of what the model is/does. -->
+In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
 ## Model Details
 <!-- Provide a longer summary of what this model is. -->
+The HH-RLHF-Helpful dataset contains 112K comparison samples in the training set and 12.5K comparison samples in the test set. We first replace the ``\n\nHuman'' and ``\n\nAssistant'' in the dataset by ``###Human'' and ``###Assistant'', respectively.
 Then, we split the dataset as follows: