weqweasdas commited on
Commit
13f510a
·
1 Parent(s): 7cad203

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -42,9 +42,9 @@ We further test the generalization ability of the reward model but with another
42
 
43
  | Dataset training/test | open assistant | chatbot | hh_rlhf |
44
  | -------------- | -------------- | ------- | ------- |
45
- | open assistant | 69.5 | 61.1 | 58.7 |
46
  | chatbot | 66.5 | 62.7 | 56.0 |
47
- | hh_rlhf | 69.4 | 64.2 | 77.6 |
48
 
49
  As we can see, the reward model trained on the HH-RLHF achieves matching or even better accuracy on open assistant and chatbot datasets, even though it is not trained on them directly. Therefore, the reward model may also be used for these two datasets.
50
 
 
42
 
43
  | Dataset training/test | open assistant | chatbot | hh_rlhf |
44
  | -------------- | -------------- | ------- | ------- |
45
+ | open assistant | **69.5** | 61.1 | 58.7 |
46
  | chatbot | 66.5 | 62.7 | 56.0 |
47
+ | hh_rlhf | 69.4 | **64.2** | **77.6** |
48
 
49
  As we can see, the reward model trained on the HH-RLHF achieves matching or even better accuracy on open assistant and chatbot datasets, even though it is not trained on them directly. Therefore, the reward model may also be used for these two datasets.
50