cambioml
/

rlhf-reward-model

Text Classification

Inference Endpoints

Model card Files Files and versions Community

goldmermaid commited on Jul 9, 2023

Commit

5a7b9c4

•

1 Parent(s): ab54afd

short instruction

Files changed (1) hide show

README.md +41 -0

README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+# 🚀 RLHF Step-2 Reward Model
+This repository is home to a RLHF reward model. This model is trained on questions and answers from the Stack Overflow Data Dump (https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences), using the `distilroberta-base` model (https://huggingface.co/distilroberta-base) as a base.
+## Usage
+You can use this model directly with a pipeline for tasks such as text generation and instruction following:
+```python
+from transformers import (
+    AutoModelForSequenceClassification,
+    AutoTokenizer,
+    pipeline
+)
+reward_model = AutoModelForSequenceClassification.from_pretrained(
+    cambioml/rlhf_reward_model,
+    num_labels=1,
+    # torch_dtype=torch.bfloat16,
+    load_in_8bit=True,
+    device_map={"": Accelerator().process_index}
+)
+reward_tokenizer = AutoTokenizer.from_pretrained(cambioml/rlhf_reward_model)
+reward_tokenizer.pad_token = reward_tokenizer.eos_token
+reward_kwargs = {
+    "return_all_scores": True,
+    "function_to_apply": "none",
+    "batch_size": 32,
+    "truncation": True,
+    "max_length": 138
+}
+reward_pipe = pipeline(
+    "sentiment-analysis",
+    model=reward_model,
+    model_kwargs=reward_kwargs,
+    tokenizer=reward_tokenizer,
+    return_token_type_ids=False,
+)
+```