metadata

license: llama3
library_name: nemo
language:
  - en
inference: false
fine-tuning: false
tags:
  - nvidia
  - steerlm
  - llama3
  - reward model
datasets:
  - nvidia/HelpSteer2

Llama3-70B-SteerLM-RM

License

The use of this model is governed by the Llama 3 Community License Agreement

Description:

Llama3-70B-SteerLM-RM is a 70 billion parameter language model (with context of up to 8,192 tokens) used as an Attribute Prediction Model, a multi-aspect Reward Model that rates model responses on various aspects that makes a response desirable instead of a singular score in a conventional Reward Model.

Given a conversation with multiple turns between user and assistant, it rates the following attributes (between 0 and 4) for every assistant turn.

Helpfulness: Overall helpfulness of the response to the prompt.
Correctness: Inclusion of all pertinent facts without errors.
Coherence: Consistency and clarity of expression.
Complexity: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
Verbosity: Amount of detail included in the response, relative to what is asked for in the prompt.

Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights [0, 0, 0, 0, 0.65, 0.8, 0.45, 0.55, -0.4] to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with Llama2-13B-SteerLM-RM but the first four are not trained or used)

Llama3-70B-SteerLM-RM is trained from Llama 3 70B Base with the HelpSteer2 dataset

HelpSteer2 Paper : HelpSteer2: Open-source dataset for training top-performing reward models

Llama3-70B-SteerLM-RM is trained with NVIDIA NeMo-Aligner, a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the NeMo Framework which allows for scaling training with data and model parallelism for all components of alignment. All of our checkpoints are compatible with the NeMo ecosystem, allowing for inference deployment and further customization.

RewardBench Primary Dataset LeaderBoard

Model	Type of Model	Overall	Chat	Chat-Hard	Safety	Reasoning
Nemotron-4-340B-Reward	Trained with Permissive Licensed Data	92.0	95.8	87.1	91.5	93.7
ArmoRM-Llama3-8B-v0.1	Trained with GPT4 Generated Data	90.8	96.9	76.8	92.2	97.3
Cohere May 2024	Proprietary LLM	89.5	96.4	71.3	92.7	97.7
Llama3-70B-SteerLM-RM	Trained with Permissive Licensed Data	88.8	91.3	80.3	92.8	90.7
Google Gemini Pro 1.5	Proprietary LLM	88.1	92.3	80.6	87.5	92.0
RLHFlow-Llama3-8B	Trained with GPT4 Generated Data	87.1	98.3	65.8	89.7	94.7
Cohere March 2024	Proprietary LLM	87.1	94.7	65.1	90.3	98.7
GPT-4-0125-Preview	Proprietary LLM	85.9	95.3	74.3	87.2	86.9
Claude 3 Opus 0229	Proprietary LLM	80.7	94.7	60.3	89.1	78.7
Llama3 70B Instruct	Trained with Permissive Licensed Data	76.0	97.6	58.9	69.2	78.5

Last updated: 12 Jun 2024

Note that we only consider the first four categories in RewardBench, because the optional fifth category (Prior Sets) is

Heavily towards models trained on Anthropic HHH, Anthropic Helpful, OpenAI Summarize and Stanford Human Preferences (constituent datasets for the Prior Sets category) and therefore can be easily gamed (see About page on RewardBench)
Extremely noisy with many constituent datasets (e.g. Anthropic Helpful, OpenAI summarize) not being able to reach val accuracy beyond ~0.70 even if training on the training set alone, suggesting unchecked errors in annotation (see https://arxiv.org/abs/2401.06080 for details)
Not reported by several models such as Google Gemini Pro 1.5 and Claude 3 Opus 0229, making comparisons unfair since Prior Sets typically has lower scores than other categories

Usage:

You can use the model with NeMo Aligner following SteerLM training user guide.

Spin up an inference server within the NeMo container (docker pull nvcr.io/nvidia/nemo:24.01.framework)

HF_HOME=<YOUR_HF_HOME_CONTAINING_TOKEN_WITH_LLAMA3_70B_ACCESS> \
python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
      rm_model_file=Llama3-70B-SteerLM-RM \
      trainer.num_nodes=1 \
      trainer.devices=8 \
      ++model.tensor_model_parallel_size=8 \
      ++model.pipeline_model_parallel_size=1 \
      inference.micro_batch_size=2 \
      inference.port=1424

Annotate data files using the served reward model. As an example, this can be the Open Assistant train/val files. Then follow the next step to train a SteerLM model based on SteerLM training user guide .

Please note that this script rounds the predicted floats to the nearest int (between 0 and 4 inclusive), as it's meant for SteerLM training. For other use cases (e.g. reward bench measurement, response filtering/ranking), we recommend using the floats directly, which can be done by commenting out two lines of code in NeMo-Aligner

python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst

python /opt/NeMo-Aligner/examples/nlp/data/steerlm/attribute_annotate.py \
      --input-file=data/oasst/train.jsonl \
      --output-file=data/oasst/train_labeled.jsonl \
      --port=1424

Alternatively, this can be any conversational data file (in .jsonl) in the following format, where each line looks like

{
    "conversations": [
              {"value": <user_turn_1>, "from": "User", "label": None},
              {"value": <assistant_turn_1>, "from": "Assistant", "label": <formatted_label_1>},
              {"value": <user_turn_2>, "from": "User", "label": None},
              {"value": <assistant_turn_2>, "from": "Assistant", "label": <formatted_label_2>},
          ],
    "mask": "User"
}

Ideally, each <formatted_label_n> refers to the ground truth label for the assistant turn but if they are not available, we can also use helpfulness:4,correctness:4,coherence:4,complexity:2,verbosity:2 (i.e. defaulting to moderate complexity and verbosity, adjust if needed. or simply helpfulness:-1. It must not be None or an empty string.

Contact

E-Mail: Zhilin Wang

Citation

If you find this model useful, please cite the following works

@misc{wang2024helpsteer2,
      title={HelpSteer2: Open-source dataset for training top-performing reward models}, 
      author={Zhilin Wang and Yi Dong and Olivier Delalleau and Jiaqi Zeng and Gerald Shen and Daniel Egert and Jimmy J. Zhang and Makesh Narsimhan Sreedhar and Oleksii Kuchaiev},
      year={2024},
      eprint={2406.08673},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}