BERT Reward Model

This model was fine-tuned on the 'Cultural Kaleidoscope' dataset to act as a Reward Model for RLHF. It uses bert-base-uncased as the base.

Important: Custom Class Required

This model uses a custom wrapper class (BertForReward) to fix a compatibility issue between BERT (an encoder) and the RewardTrainer (which expects generative model arguments like use_cache).

You must define this class in your script before loading the model.

1. Define the Class

Copy this code into your script:

import torch
from transformers import BertForSequenceClassification

class BertForReward(BertForSequenceClassification):
    def forward(self, *args, **kwargs):
        kwargs.pop("use_cache", None)
        return super().forward(*args, **kwargs)

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support