Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ with the K-wise maximum likelihood estimator proposed in [this paper](https://ar
|
|
20 |
less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
|
21 |
towards GPT-4's own preference, including longer responses and certain response format.
|
22 |
|
23 |
-
For more detailed discussions, please check out our [blog post](starling.cs.berkeley.edu), and stay tuned for our upcoming code and paper!
|
24 |
|
25 |
|
26 |
- **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
|
|
|
20 |
less harmful will get the highest reward score. Note that since the preference dataset [berkeley-nest/Nectar](https://huggingface.co/berkeley-nest) is based on GPT-4 preference, the reward model is likely to be biased
|
21 |
towards GPT-4's own preference, including longer responses and certain response format.
|
22 |
|
23 |
+
For more detailed discussions, please check out our [blog post](https://starling.cs.berkeley.edu), and stay tuned for our upcoming code and paper!
|
24 |
|
25 |
|
26 |
- **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
|