Text Generation
NeMo
English
nvidia
steerlm
llama2
reward model

RewardBench results?

#2
by Avelina - opened

I think it would be useful for the model to be evaluated on RewardBench and the results published.

This may be useful for researchers/developers wondering how much of a gap there is between this RM and the 70B Llama 3 RM which would help us to evaluate the price/quality tradeoff of using either model.

NVIDIA org

Hi Thank you for your interest in this model! Compared to https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM , this model has approximately 15% lower overall RewardBench score - partly due to the older and smaller Llama2 base model and partly due to the reward modeling datasets used. We highly recommend using the 70B Llama 3 RM over this model.

Sign up or log in to comment