theblackcat102 commited on
Commit
b1e1683
1 Parent(s): 57a3e97

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - webgpt
6
+ - regression
7
+ - reward-model
8
+ license: "apache-2.0"
9
+ datasets:
10
+ - openai/webgpt_comparisons
11
+ metrics:
12
+ - accuracy
13
+ ---
14
+ # Reward Model pretrained on openai/webgpt_comparison
15
+
16
+ Reward model finetuned from existing pretrain model.
17
+
18
+ Things that aligned with the orignal papers
19
+
20
+ * Overfits easily using rank loss
21
+
22
+ * Small learning rate
23
+
24
+ Different from the papers
25
+
26
+
27
+ * Small model performs bad due to lack of world knowledge, since the validation accuracy doesn't even reach 60%. OpenAI RM had 6B parameters.
28
+
29
+ * Train using a 80-20 train-validation split on torch AMP settings
30
+
31
+
32
+ Other models I had tried
33
+
34
+ * bloomz-560m : embedding size doesn't worth the training, since this dataset only contain english prompt
35
+
36
+ * gpt2-large : not stable
37
+
38
+ * gpt2-base : not stable
39
+
40
+
41
+ # Performance on validation split
42
+
43
+ | model | val acc | val loss (rank loss) |
44
+ |---|---|---|
45
+ | [roberta-base](https://huggingface.co/theblackcat102/roberta-base-webgpt-rm) | 56.21 | 0.71 |
46
+ | [roberta-large](https://huggingface.co/theblackcat102/roberta-large-webgpt-rm) | 57.89 | 0.67 |
47
+ | [electra-base](https://huggingface.co/theblackcat102/electra-base-webgpt-rm) | 57.02 | 0.70 |
48
+ | [electra-large](https://huggingface.co/theblackcat102/electra-large-webgpt-rm) | 58.75 | 0.69 |
49
+
50
+ Tensorboard logs are located under runs/
51
+
52
+
53
+
54
+ # Note:
55
+
56
+ * You will have to reweight this model output such that the mean rewards equals to 0