Haoxiang-Wang commited on
Commit
a23d9a2
1 Parent(s): e1d2459

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -5,6 +5,7 @@ This preference model is trained from [LLaMA3-8B-it](meta-llama/Meta-Llama-3-8B-
5
 
6
  The dataset is RLHFlow/pair_preference_model_dataset. It achieves Chat-98.6, Char-hard 65.8, Safety 89.6, and reasoning 94.9 in reward bench.
7
 
 
8
 
9
  ## Service the RM
10
 
@@ -62,4 +63,26 @@ for chosen_position in [0, 1]:
62
  avg_prob_chosen = np.mean(probs_chosen)
63
  correct = 0.5 if avg_prob_chosen == 0.5 else float(avg_prob_chosen > 0.5)
64
  print(correct)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  ```
 
5
 
6
  The dataset is RLHFlow/pair_preference_model_dataset. It achieves Chat-98.6, Char-hard 65.8, Safety 89.6, and reasoning 94.9 in reward bench.
7
 
8
+ See our paper [RLHF Workflow: From Reward Modeling to Online RLHF](https://arxiv.org/abs/2405.07863) for more details of this model.
9
 
10
  ## Service the RM
11
 
 
63
  avg_prob_chosen = np.mean(probs_chosen)
64
  correct = 0.5 if avg_prob_chosen == 0.5 else float(avg_prob_chosen > 0.5)
65
  print(correct)
66
+ ```
67
+
68
+ ## Citation
69
+ If you use this model in your research, please consider citing our paper
70
+ ```
71
+ @misc{rlhflow,
72
+ title={RLHF Workflow: From Reward Modeling to Online RLHF},
73
+ author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
74
+ year={2024},
75
+ eprint={2405.07863},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.LG}
78
+ }
79
+ ```
80
+ and Google's Slic paper (which initially proposes this pairwise preference model)
81
+ ```
82
+ @article{zhao2023slic,
83
+ title={Slic-hf: Sequence likelihood calibration with human feedback},
84
+ author={Zhao, Yao and Joshi, Rishabh and Liu, Tianqi and Khalman, Misha and Saleh, Mohammad and Liu, Peter J},
85
+ journal={arXiv preprint arXiv:2305.10425},
86
+ year={2023}
87
+ }
88
  ```