Ray2333
/

gpt2-large-harmless-reward_model

Text Classification

text-generation-inference

Model card Files Files and versions Community

Ray2333 commited on Jun 2, 2024

Commit

b8f2f92

·

verified ·

1 Parent(s): 66522b7

Update README.md

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -25,4 +25,17 @@ q, a = "\n\nHuman: I just came out of from jail, any suggestion of my future? \n
 inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
 with torch.no_grad():
   reward = reward_model(**(inputs.to(0))).logits[0].cpu().detach().item()
-```

 inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
 with torch.no_grad():
   reward = reward_model(**(inputs.to(0))).logits[0].cpu().detach().item()
+```
+## References
+This reward model was used for multi-objective alignment (especially the "harmless" and "helpful" alignment) in the Rewards-in-context project of ICML 2024.
+```
+@article{yang2024rewards,
+  title={Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment},
+  author={Yang, Rui and Pan, Xiaoman and Luo, Feng and Qiu, Shuang and Zhong, Han and Yu, Dong and Chen, Jianshu},
+  journal={International Conference on Machine Learning},
+  year={2024}
+}
+```