README.md · miulab/llama2-7b-ultrafeedback-rm at 9649702ed28d4cd38f27c657b56cfd2d4e98959c

metadata

license: apache-2.0
datasets:
  - argilla/ultrafeedback-binarized-preferences-cleaned
language:
  - en
base_model:
  - meta-llama/Llama-2-7b-hf
pipeline_tag: text-classification

This is the base reward model used in the paper "DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging".

For the detailed information about this model, please refer to our paper.

If you found this model useful, please cite our paper:

@article{lin2024dogerm,
  title={DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging},
  author={Lin, Tzu-Han and Li, Chen-An and Lee, Hung-yi and Chen, Yun-Nung},
  journal={arXiv preprint arXiv:2407.01470},
  year={2024}
}