NeMo
English
nvidia
steerlm
reward model
zhilinw commited on
Commit
5193c15
1 Parent(s): 3d6fe6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: other
3
  license_name: nvidia-open-model-license
4
- license_link: LICENSE
5
  library_name: nemo
6
  language:
7
  - en
@@ -25,6 +25,8 @@ datasets:
25
 
26
  The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
27
 
 
 
28
  Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
29
 
30
  1. **Helpfulness**: Overall helpfulness of the response to the prompt.
@@ -36,9 +38,10 @@ Given a conversation with multiple turns between user and assistant, it rates th
36
  Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.3, 0.74, 0.46, 0.47, -0.33]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
37
 
38
  Under the NVIDIA Open Model License, NVIDIA confirms:
39
- Models are commercially usable.
40
- You are free to create and distribute Derivative Models.
41
- NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
 
42
 
43
  ### License:
44
 
@@ -54,11 +57,9 @@ Nemotron-4 340B-Reward can be used in the alignment stage to align pretrained mo
54
 
55
  **Model Input:** Text only
56
  **Input Format:** String
57
- **Input Parameters:** One-Dimensional (1D)
58
 
59
  **Model Output:** Scalar Values (List of 9 Floats)
60
  **Output Format:** Float
61
- **Output Parameters:** 1D
62
 
63
  **Model Dates:** Nemotron-4-340B-Reward was trained between December 2023 and May 2024
64
 
 
1
  ---
2
  license: other
3
  license_name: nvidia-open-model-license
4
+ license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
5
  library_name: nemo
6
  language:
7
  - en
 
25
 
26
  The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
27
 
28
+ It supports a context length of up to 4,096 tokens.
29
+
30
  Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
31
 
32
  1. **Helpfulness**: Overall helpfulness of the response to the prompt.
 
38
  Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.3, 0.74, 0.46, 0.47, -0.33]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
39
 
40
  Under the NVIDIA Open Model License, NVIDIA confirms:
41
+
42
+ * Models are commercially usable.
43
+ * You are free to create and distribute Derivative Models.
44
+ * NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
45
 
46
  ### License:
47
 
 
57
 
58
  **Model Input:** Text only
59
  **Input Format:** String
 
60
 
61
  **Model Output:** Scalar Values (List of 9 Floats)
62
  **Output Format:** Float
 
63
 
64
  **Model Dates:** Nemotron-4-340B-Reward was trained between December 2023 and May 2024
65