nvidia/Llama-3.1-Nemotron-70B-Instruct-HF · [EVALS] Metrics compared to 3.1-70b Instruct by Meta

ID0M

Oct 17, 2024

NemesisPrime

Oct 18, 2024

Meta's model is better?

okuchaiev

NVIDIA org Oct 18, 2024

•

edited Oct 18, 2024

To be clear - this mode is NOT trained on any new data which has not been previously released before. Instead, we use previously published preference data (HelpSteer2) and public reward model (https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Reward) to tune this model using REINFORCE for human preferences.

So, this model is not expected to be better on math, coding etc. than the model we've started with - llama-3.1-70b-instruct.
Instead, we expect (as indicated by Arena Hard, AlpacaEval and MT-bench) that humans may prefer responses from this model more.

We are currently validating this hypothesis on lmsys.org chatbot arena and will update model card with Elo scores once we have them.