Haoxiang-Wang commited on
Commit
86323c8
1 Parent(s): f6bdb40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -11,7 +11,7 @@ license: llama3
11
  [Haoxiang Wang*](https://haoxiang-wang.github.io/), [Wei Xiong*](https://weixiongust.github.io/WeiXiongUST/index.html), [Tengyang Xie](https://tengyangxie.github.io/), [Han Zhao](https://hanzhaoml.github.io/), [Tong Zhang](https://tongzhang-ml.org/)
12
 
13
  + **Blog**: https://rlhflow.github.io/posts/2024-05-29-multi-objective-reward-modeling/
14
- + **Tech Report**: To be released in June 2024
15
  + **Model**: [ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)
16
  + Finetuned from model: [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
17
  - **Code Repository:** https://github.com/RLHFlow/RLHF-Reward-Modeling/
@@ -101,10 +101,10 @@ print(helpsteer_rewards_pred)
101
 
102
  If you find this work useful for your research, please consider citing:
103
  ```
104
- @misc{wang2024interpretable,
105
- title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
106
- author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
107
- year={2024}
108
  }
109
 
110
  @inproceedings{wang2024arithmetic,
 
11
  [Haoxiang Wang*](https://haoxiang-wang.github.io/), [Wei Xiong*](https://weixiongust.github.io/WeiXiongUST/index.html), [Tengyang Xie](https://tengyangxie.github.io/), [Han Zhao](https://hanzhaoml.github.io/), [Tong Zhang](https://tongzhang-ml.org/)
12
 
13
  + **Blog**: https://rlhflow.github.io/posts/2024-05-29-multi-objective-reward-modeling/
14
+ + **Tech Report**: https://arxiv.org/abs/2406.12845
15
  + **Model**: [ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)
16
  + Finetuned from model: [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1)
17
  - **Code Repository:** https://github.com/RLHFlow/RLHF-Reward-Modeling/
 
101
 
102
  If you find this work useful for your research, please consider citing:
103
  ```
104
+ @article{ArmoRM,
105
+ title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
106
+ author={Haoxiang Wang and Wei Xiong and Tengyang Xie and Han Zhao and Tong Zhang},
107
+ journal={arXiv preprint arXiv:2406.12845},
108
  }
109
 
110
  @inproceedings{wang2024arithmetic,