theblackcat102
commited on
Commit
•
7d67d6f
1
Parent(s):
c9b42a8
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
|
6 |
+
Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.
|
7 |
+
|
8 |
+
On validation dataset the result is much more stable than usual.
|
9 |
+
|
10 |
+
You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details
|
11 |
+
|