Text Classification
Transformers
PyTorch
English
electra
reward-model
reward_model
RLHF
Inference Endpoints
theblackcat102 commited on
Commit
b89ee07
·
1 Parent(s): ae6162c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -1,3 +1,46 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - openai/webgpt_comparisons
5
+ - openai/summarize_from_feedback
6
+ - Dahoas/instruct-synthetic-prompt-responses
7
+ language:
8
+ - en
9
+ metrics:
10
+ - accuracy
11
+ tags:
12
+ - reward-model
13
+ - reward_model
14
+ - RLHF
15
  ---
16
+ # Reward model trained from human feedback
17
+
18
+ Reward model (RM) trained to predict which generated answer is better judged by a human, given a question.
19
+
20
+ RM are useful in these domain:
21
+
22
+ - QA model evaluation
23
+
24
+ - serves as reward score in RLHF
25
+
26
+
27
+ All models are train on these dataset with a same split seed across datasets (if validation split wasn't available)
28
+
29
+ - [webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)
30
+
31
+ - [summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)
32
+
33
+ - [synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise)
34
+
35
+
36
+ # Performance
37
+
38
+ Validation split accuracy
39
+
40
+ | Model | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) | [Summary](https://huggingface.co/datasets/openai/summarize_from_feedback) | [SytheticGPT](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) |
41
+ |---|---|---|---|
42
+ | [electra-large-discriminator](https://huggingface.co/OpenAssistant/reward-model-electra-large-discriminator) | 59.30 | 68.66 | 99.85 |
43
+ | [deberta-v3-large](https://huggingface.co/OpenAssistant/reward-model-deberta-v3-large) | 61.13 | 72.23 | 99.94 |
44
+ | [deberta-v3-base](https://huggingface.co/OpenAssistant/reward-model-deberta-v3-base) | 59.07 | 66.84 | 99.85 |
45
+
46
+ Its likely SytheticGPT has somekind of surface pattern on the choosen-rejected pair which makes it trivial to differentiate between better the answer.