banghua commited on
Commit
b5fc156
1 Parent(s): 36eccc9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -20,6 +20,7 @@ with the K-wise maximum likelihood estimator proposed in [this paper](https://ar
20
  less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
21
  towards GPT-4's own preference, including longer responses and certain response format.
22
 
 
23
 
24
 
25
  - **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
@@ -35,16 +36,15 @@ towards GPT-4's own preference, including longer responses and certain response
35
  - **Blog:** https://starling.cs.berkeley.edu/
36
  - **Paper:** Coming soon!
37
  - **Code:** Coming soon!
38
- -
39
  ## Uses
40
 
41
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
42
- Please use the following code for running inference with the reward model.
43
 
44
  ```
45
- # Load in the reward model
46
-
47
-
48
  ```
49
 
50
 
 
20
  less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
21
  towards GPT-4's own preference, including longer responses and certain response format.
22
 
23
+ For more detailed discussions, please check out our [blog post](starling.cs.berkeley.edu), and stay tuned for our upcoming code and paper!
24
 
25
 
26
  - **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
 
36
  - **Blog:** https://starling.cs.berkeley.edu/
37
  - **Paper:** Coming soon!
38
  - **Code:** Coming soon!
39
+
40
  ## Uses
41
 
42
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
43
+ Please use the following code for inference with the reward model.
44
 
45
  ```
46
+ ## Define the reward model function class
47
+ Test.
 
48
  ```
49
 
50