BERT Reward Model

This model was fine-tuned on the 'Cultural Kaleidoscope' dataset to act as a Reward Model for RLHF. It uses bert-base-uncased as the base.

Important: Custom Class Required

This model uses a custom wrapper class (BertForReward) to fix a compatibility issue between BERT (an encoder) and the RewardTrainer (which expects generative model arguments like use_cache).

You must define this class in your script before loading the model.

1. Define the Class

Copy this code into your script:

import torch
from transformers import BertForSequenceClassification

class BertForReward(BertForSequenceClassification):
    def forward(self, *args, **kwargs):
        kwargs.pop("use_cache", None)
        return super().forward(*args, **kwargs)
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support