RewardBench results?

by Avelina - opened Jul 12, 2024

Jul 12, 2024

I think it would be useful for the model to be evaluated on RewardBench and the results published.

This may be useful for researchers/developers wondering how much of a gap there is between this RM and the 70B Llama 3 RM which would help us to evaluate the price/quality tradeoff of using either model.

zhilinw

NVIDIA org Jul 15, 2024

Hi Thank you for your interest in this model! Compared to https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM , this model has approximately 15% lower overall RewardBench score - partly due to the older and smaller Llama2 base model and partly due to the reward modeling datasets used. We highly recommend using the 70B Llama 3 RM over this model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment