nvidia
/

NV-Llama2-13B-RLHF-RM

Text Generation

Model card Files Files and versions Community

jiaqiz commited on Feb 23

Commit

3e6dd79

•

1 Parent(s): cfa74e6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ datasets:
 ## Description:
-Llama2-13B-RLHF-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Reward Model in training [NV-Llama2-70B-RLHF](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/nv-llama2-70b-rlhf), which achieves 7.59 on MT-Bench and demonstrates strong performance on academic benchmarks.
 Starting from [Llama2-13B base model](https://huggingface.co/meta-llama/Llama-2-13b), it is first instruction-tuned with a combination of public and proprietary data and then trained on [HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) with reward modeling objective. Given a conversation with multiple turns between user and assistant, it assigns a preference score on the last assistant turn.

 ## Description:
+Llama2-13B-RLHF-RM is a 13 billion parameter language model (with context of up to 4,096 tokens) used as the Reward Model in training [NV-Llama2-70B-RLHF](https://huggingface.co/nvidia/NV-Llama2-70B-RLHF-Chat), which achieves 7.59 on MT-Bench and demonstrates strong performance on academic benchmarks.
 Starting from [Llama2-13B base model](https://huggingface.co/meta-llama/Llama-2-13b), it is first instruction-tuned with a combination of public and proprietary data and then trained on [HH-RLHF dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf) with reward modeling objective. Given a conversation with multiple turns between user and assistant, it assigns a preference score on the last assistant turn.