weqweasdas commited on
Commit
5519e53
1 Parent(s): f3a61bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -6,7 +6,7 @@
6
 
7
  <!-- Provide a quick summary of what the model is/does. -->
8
 
9
- The reward model is trained from the base model [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
10
 
11
  The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
12
 
@@ -18,7 +18,7 @@ If you have any question with this reward model and also any question about rewa
18
 
19
  <!-- Provide a longer summary of what this model is. -->
20
 
21
- The model is trained on a mixture of [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
22
 
23
  - [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
24
  - [SHP](https://huggingface.co/datasets/stanfordnlp/SHP)
 
6
 
7
  <!-- Provide a quick summary of what the model is/does. -->
8
 
9
+ The reward model is trained from the base model [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2).
10
 
11
  The training script is available at https://github.com/WeiXiongUST/RLHF-Reward-Modeling .
12
 
 
18
 
19
  <!-- Provide a longer summary of what this model is. -->
20
 
21
+ The model is trained on a mixture of the dataset similar to [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it).
22
 
23
  - [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf)
24
  - [SHP](https://huggingface.co/datasets/stanfordnlp/SHP)