NeMo
English
nvidia
steerlm
llama3
reward model
zhilinw commited on
Commit
750e3e6
1 Parent(s): 7af1076

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -3
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ library_name: nemo
4
+ language:
5
+ - en
6
+ inference: false
7
+ fine-tuning: false
8
+ tags:
9
+ - nvidia
10
+ - steerlm
11
+ - llama3
12
+ - reward model
13
+ datasets:
14
+ - nvidia/HelpSteer2
15
+ ---
16
+
17
+ # Llama3-70B-SteerLM-RM
18
+
19
+ ## License
20
+ The use of this model is governed by the [Llama 3 Community License Agreement](https://github.com/meta-llama/llama3/blob/main/LICENSE)
21
+
22
+ ## Description:
23
+ Llama3-70B-SteerLM-RM is a 70 billion parameter language model (with context of up to 8,192 tokens) used as an Attribute Prediction Model, a multi-aspect Reward Model that rates model responses on various aspects that makes a response desirable instead of a singular score in a conventional Reward Model.
24
+
25
+ Given a conversation with multiple turns between user and assistant, it rates the following attributes (between 0 and 4) for every assistant turn.
26
+
27
+ 1. **Helpfulness**: Overall helpfulness of the response to the prompt.
28
+ 2. **Correctness**: Inclusion of all pertinent facts without errors.
29
+ 3. **Coherence**: Consistency and clarity of expression.
30
+ 4. **Complexity**: Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
31
+ 5. **Verbosity**: Amount of detail included in the response, relative to what is asked for in the prompt.
32
+
33
+ Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.65, 0.8, 0.45, 0, 0]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
34
+
35
+
36
+ Llama3-70B-SteerLM-RM is trained from [Llama 3 70B Base](https://huggingface.co/meta-llama/Meta-Llama-3-70B) with the [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) dataset
37
+
38
+
39
+ HelpSteer Paper : [HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM](http://arxiv.org/abs/2311.09528)
40
+
41
+ SteerLM Paper: [SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF](https://arxiv.org/abs/2310.05344)
42
+
43
+ Llama3-70B-SteerLM-RM is trained with NVIDIA [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner), a scalable toolkit for performant and efficient model alignment. NeMo-Aligner is built using the [NeMo Framework](https://github.com/NVIDIA/NeMo) which allows for scaling training up to 1000s of GPUs using tensor, data and pipeline parallelism for all components of alignment. All of our checkpoints are cross compatible with the NeMo ecosystem, allowing for inference deployment and further customization.
44
+
45
+ ## RewardBench LeaderBoard
46
+
47
+
48
+ | Model | Type of Model| Overall | Chat | Chat Hard | Safety | Reasoning |
49
+ |:-----------------------------|:----------------|:-----|:----------|:-------|:----------|:-----------------------|
50
+ | Nemotron-4-340B-SteerLM-RM | Proprietary LLM| **91.6** | 95.5 |**86.4** | 90.8 | 93.6 |
51
+ | ArmoRM-Llama3-8B-v0.1 | Trained with GPT4 Data| 90.8 | 96.9 | 76.8 | 92.2 | 97.3 |
52
+ | Cohere May 2024 | Proprietary LLM | 89.5 | 96.4 | 71.3 | **92.7** | 97.7 |
53
+ | **Llama3-70B-SteerLM-RM** | Trained with Permissive Licensed Data | 88.2 | 91.9 | 79.8 | 92.2 | 89.0 |
54
+ | Google Gemini Pro 1.5 | Proprietary LLM | 88.1 | 92.3 | 80.6 | 87.5 | 92.0 |
55
+ | RLHFlow-Llama3-8B | Trained with GPT4 Data | 87.1 | **98.3** | 65.8 | 89.7 | 94.7 |
56
+ | Cohere March 2024 | Proprietary LLM | 87.1| 94.7 | 65.1 | 90.3 | **98.7** |
57
+ | GPT-4-0125-Preview|Proprietary LLM | 85.9 | 95.3 | 74.3 | 87.2 | 86.9 |
58
+ | Claude 3 Opus 0229 | Proprietary LLM | 80.7 | 94.7 | 60.3 | 89.1 | 78.7 |
59
+
60
+ Last updated: 1 Jun 2024
61
+
62
+ Note that we only consider the first four categories in RewardBench, because the optional fifth category (Prior Sets) is
63
+ 1. Heavily towards models trained on Anthropic HHH, Anthropic Helpful, OpenAI Summarize and Stanford Human Preferences (constituent datasets for the Prior Sets category) and therefore can be easily gamed (see About page on RewardBench)
64
+ 2. Extremely noisy with many constituent datasets (e.g. Anthropic Helpful, OpenAI summarize) not being able to reach val accuracy beyond ~0.70 even if training on the training set alone, suggesting unchecked errors in annotation (see https://arxiv.org/abs/2401.06080 for details)
65
+ 3. Not reported by several models such as Google Gemini Pro 1.5 and Claude 3 Opus 0229, making comparisons unfair since Prior Sets typically has lower scores than other categories
66
+
67
+
68
+ ## Usage:
69
+
70
+
71
+ You can use the model with [NeMo Aligner](https://github.com/NVIDIA/NeMo-Aligner) following [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html).
72
+
73
+
74
+ 1. Spin up an inference server within the [NeMo Aligner container](https://github.com/NVIDIA/NeMo-Aligner/blob/main/Dockerfile)
75
+
76
+ ```python
77
+ python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py \
78
+ rm_model_file=Llama3-70B-SteerLM-RM \
79
+ trainer.num_nodes=1 \
80
+ trainer.devices=8 \
81
+ ++model.tensor_model_parallel_size=8 \
82
+ ++model.pipeline_model_parallel_size=1 \
83
+ inference.micro_batch_size=2 \
84
+ inference.port=1424
85
+ ```
86
+
87
+ 2. Annotate data files using the served reward model. As an example, this can be the Open Assistant train/val files. Then follow the next step to train a SteerLM model based on [SteerLM training user guide](https://docs.nvidia.com/nemo-framework/user-guide/latest/modelalignment/steerlm.html#step-5-train-the-attribute-conditioned-sft-model) .
88
+
89
+ ```python
90
+ python /opt/NeMo-Aligner/examples/nlp/data/steerlm/preprocess_openassistant_data.py --output_directory=data/oasst
91
+
92
+ python /opt/NeMo-Aligner/examples/nlp/data/steerlm/attribute_annotate.py \
93
+ --input-file=data/oasst/train.jsonl \
94
+ --output-file=data/oasst/train_labeled.jsonl \
95
+ --port=1424
96
+ ```
97
+
98
+ 3. Alternatively, this can be any conversational data file (in .jsonl) in the following format, where each line looks like
99
+
100
+ ```json
101
+ {
102
+ "conversations": [
103
+ {"value": <user_turn_1>, "from": "User", "label": None},
104
+ {"value": <assistant_turn_1>, "from": "Assistant", "label": <formatted_label_1>},
105
+ {"value": <user_turn_2>, "from": "User", "label": None},
106
+ {"value": <assistant_turn_2>, "from": "Assistant", "label": <formatted_label_2>},
107
+ ],
108
+ "mask": "User"
109
+ }
110
+ ```
111
+
112
+ Ideally, each ```<formatted_label_n>``` refers to the ground truth label for the assistant turn but if they are not available, we can also use ```helpfulness:4,correctness:4,coherence:4,complexity:2,verbosity:2``` (i.e. defaulting to moderate complexity and verbosity, adjust if needed. or simply ```helpfulness:-1```. It must not be ```None``` or an empty string.
113
+
114
+
115
+
116
+ ## Contact
117
+
118
+ E-Mail: [Zhilin Wang](mailto:zhilinw@nvidia.com)
119
+
120
+
121
+ ## Citation
122
+
123
+ If you find this dataset useful, please cite the following works
124
+
125
+ ```bibtex
126
+ @misc{wang2023helpsteer,
127
+ title={HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM},
128
+ author={Zhilin Wang and Yi Dong and Jiaqi Zeng and Virginia Adams and Makesh Narsimhan Sreedhar and Daniel Egert and Olivier Delalleau and Jane Polak Scowcroft and Neel Kant and Aidan Swope and Oleksii Kuchaiev},
129
+ year={2023},
130
+ eprint={2311.09528},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.CL}
133
+ }
134
+ ```
135
+
136
+ ```bibtex
137
+ @misc{dong2023steerlm,
138
+ title={SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF},
139
+ author={Yi Dong and Zhilin Wang and Makesh Narsimhan Sreedhar and Xianchao Wu and Oleksii Kuchaiev},
140
+ year={2023},
141
+ eprint={2310.05344},
142
+ archivePrefix={arXiv},
143
+ primaryClass={cs.CL}
144
+ }
145
+ ```