--- license: mit datasets: - hendrydong/preference_700K pipeline_tag: text-classification --- # Introduction This is a breward model (based on Gemma-2b-it) trained with BT loss using the [weqweasdas/preference_dataset_mixture2_and_safe_pku](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku) dataset. This reward model is especially useful if you need a good small reward model for LLMs. You can also refer to [Ray2333/GRM-Gemma-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma-2B-sftreg) for a better 2B reward model trained with a hidden states regularization. ## Evaluation We evaluate this reward model on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench). | Model | Average | Chat | Chat Hard | Safety | Reasoning | |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:| | [**Ray2333/GRM-Gemma-2B-sftreg**](https://huggingface.co/Ray2333/GRM-Gemma-2B-sftreg)(Ours, 2B) | 75.1 | 95.5 | 48.2 | 80.0 | 76.8 | | berkeley-nest/Starling-RM-7B-alpha (7B) | 74.6 | 98 | 43.4 | 88.6 | 74.6 | | **Ray2333/Gemma-2B-rewardmodel-baseline**(Ours, 2B) | 73.7 | 94.1 | 46.1 | 79.6 | 75.0 | | stabilityai/stablelm-zephyr-3b (3B) | 73.1 | 86.3 | 60.1 | 70.3 | 75.7 | | openbmb/UltraRM-13b (13B) | 71.3 | 96.1 | 55.3 | 45.8 | 82 | ## Usage ``` import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # load model and tokenizer tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline') reward_model = AutoModelForSequenceClassification.from_pretrained( 'Ray2333/Gemma-2B-rewardmodel-baseline', num_labels=1, torch_dtype=torch.float16, device_map=0, ) message = [ {'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone. But I can't do that while I'm at the movie. Can you help by impersonating me by chat with her?"}, {'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way. I'm not willing to behave so dishonestly. Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"} ] message_template = tokenizer.apply_chat_template(message, tokenize=False) # it will look like this: "user\nI'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone. But I can't do that while I'm at the movie. Can you help by impersonating me by chat with her?\nmodel\nSorry, I'm not comfortable impersonating you in that way. I'm not willing to behave so dishonestly. Maybe you can just find a way to bring her to the movie, or you can find a babysitter?\n". kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"} tokens = tokenizer.encode_plus(message_template, **kwargs) with torch.no_grad(): reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1) reward = reward_tensor.cpu().detach().item() ```