PKU-Alignment
/

beaver-7b-unified-cost

+---
+datasets:
+  - PKU-Alignment/PKU-SafeRLHF
+language:
+  - en
+tags:
+  - reinforcement-learning-from-human-feedback
+  - reinforcement-learning
+  - beaver
+  - safety
+  - llama
+  - ai-safety
+  - deepspeed
+  - rlhf
+  - alpaca
+library_name: safe-rlhf
+---
+# 🦫 Beaver's Cost Model
+## Model Details
+The Beaver cost model is a preference model trained using the [PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF) dataset.
+It can play a role in the safe RLHF algorithm, helping the Beaver model become more safe and harmless.
+- **Developed by:** the [PKU-Alignment](https://github.com/PKU-Alignment) Team.
+- **Model Type:** An auto-regressive language model based on the transformer architecture.
+- **License:** Non-commercial license.
+- **Fine-tuned from model:** [LLaMA](https://arxiv.org/abs/2302.13971), [Alpaca](https://github.com/tatsu-lab/stanford_alpaca).
+## Model Sources
+- **Repository:** <https://github.com/PKU-Alignment/safe-rlhf>
+- **Beaver:** <https://huggingface.co/PKU-Alignment/beaver-7b-v3.0>
+- **Dataset:** <https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF>
+- **Reward Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-unified-reward>
+- **Cost Model:** <https://huggingface.co/PKU-Alignment/beaver-7b-unified-cost>
+- **Dataset Paper:** <https://arxiv.org/abs/2307.04657>
+- **Paper:** <https://arxiv.org/abs/2310.12773>
+## How to Use the Cost Model
+```python
+import torch
+from transformers import AutoTokenizer
+from safe_rlhf.models import AutoModelForScore
+model = AutoModelForScore.from_pretrained('PKU-Alignment/beaver-7b-unified-cost', torch_dtype=torch.bfloat16, device_map='auto')
+tokenizer = AutoTokenizer.from_pretrained('PKU-Alignment/beaver-7b-unified-cost')
+input = 'BEGINNING OF CONVERSATION: USER: hello ASSISTANT:Hello! How can I help you today?'
+input_ids = tokenizer(input, return_tensors='pt')
+output = model(**input_ids)
+print(output)
+# ScoreModelOutput(
+#     scores=tensor([[[-2.7656],
+#          [ 0.8320],
+#          [-2.7656],
+#          [-2.7500],
+#          [-0.9023],
+#          [-0.7891],
+#          [-0.3125],
+#          [-0.8008],
+#          [-0.5117],
+#          [-1.1562],
+#          [-2.3906],
+#          [-1.2266],
+#          [-1.1797],
+#          [-3.3281],
+#          [-4.4062],
+#          [-1.0234],
+#          [-1.1484],
+#          [-2.1406],
+#          [-2.9531],
+#          [-4.6250],
+#          [-4.5312],
+#          [-3.3594],
+#          [-4.1250],
+#          [-3.0156],
+#          [-3.5156],
+#          [-5.0000],
+#          [-5.7812],
+#          [-7.6562]]], grad_fn=<ToCopyBackward0>),
+#     end_scores=tensor([[-7.6562]], grad_fn=<ToCopyBackward0>),
+#     last_hidden_state=tensor([[[ 0.7148,  0.3594, -1.0234,  ...,  0.5039, -0.0737,  1.4375],
+#          [ 1.0781, -1.2812,  1.5078,  ...,  0.9102,  1.3594,  1.4141],
+#          [ 0.8047,  0.4551, -0.3262,  ...,  0.3887,  0.6484, -0.4629],
+#          ...,
+#          [-0.1836, -0.6094, -0.8086,  ..., -0.5078,  0.8086,  1.1719],
+#          [ 0.9727, -1.5156, -1.2656,  ..., -0.9766,  0.3535,  1.0156],
+#          [ 4.2812, -1.6797, -0.4238,  ...,  0.6758, -1.1875, -1.1562]]],
+#        dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>),
+#     end_last_hidden_state=tensor([[ 4.2812, -1.6797, -0.4238,  ...,  0.6758, -1.1875, -1.1562]],
+#        dtype=torch.bfloat16, grad_fn=<ToCopyBackward0>),
+#     end_index=tensor([27])
+# )
+```