File size: 1,256 Bytes
5a7b9c4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# 🚀 RLHF Step-2 Reward Model
This repository is home to a RLHF reward model. This model is trained on questions and answers from the Stack Overflow Data Dump (https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences), using the `distilroberta-base` model (https://huggingface.co/distilroberta-base) as a base.
## Usage
You can use this model directly with a pipeline for tasks such as text generation and instruction following:
```python
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
pipeline
)
reward_model = AutoModelForSequenceClassification.from_pretrained(
cambioml/rlhf_reward_model,
num_labels=1,
# torch_dtype=torch.bfloat16,
load_in_8bit=True,
device_map={"": Accelerator().process_index}
)
reward_tokenizer = AutoTokenizer.from_pretrained(cambioml/rlhf_reward_model)
reward_tokenizer.pad_token = reward_tokenizer.eos_token
reward_kwargs = {
"return_all_scores": True,
"function_to_apply": "none",
"batch_size": 32,
"truncation": True,
"max_length": 138
}
reward_pipe = pipeline(
"sentiment-analysis",
model=reward_model,
model_kwargs=reward_kwargs,
tokenizer=reward_tokenizer,
return_token_type_ids=False,
)
``` |