Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸš€ RLHF Step-2 Reward Model

This repository is home to a RLHF reward model. This model is trained on questions and answers from the Stack Overflow Data Dump (https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences), using the distilroberta-base model (https://huggingface.co/distilroberta-base) as a base.

Usage

You can use this model directly with a pipeline for tasks such as text generation and instruction following:

from transformers import (
    AutoModelForSequenceClassification, 
    AutoTokenizer, 
    pipeline
)

reward_model = AutoModelForSequenceClassification.from_pretrained(
    cambioml/rlhf_reward_model, 
    num_labels=1,
    # torch_dtype=torch.bfloat16,
    load_in_8bit=True,
    device_map={"": Accelerator().process_index}
)

reward_tokenizer = AutoTokenizer.from_pretrained(cambioml/rlhf_reward_model) 
reward_tokenizer.pad_token = reward_tokenizer.eos_token


reward_kwargs = {
    "return_all_scores": True,
    "function_to_apply": "none",
    "batch_size": 32,
    "truncation": True,
    "max_length": 138
}

reward_pipe = pipeline(
    "sentiment-analysis",
    model=reward_model,
    model_kwargs=reward_kwargs,
    tokenizer=reward_tokenizer,
    return_token_type_ids=False,
)
Downloads last month
37
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.