Ray2333
/

Gemma-2B-rewardmodel-baseline

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Ray2333 commited on Jul 5

Commit

c8dbb02

•

1 Parent(s): ee6e0f5

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ pipeline_tag: text-classification
 ---
 # Introduction
-This is a breward model (based on Gemma-2b-it) trained with BT loss using [hendrydong/preference_700K](https://huggingface.co/datasets/hendrydong/preference_700K) dataset.
 This reward model is especially useful if you need a good small reward model for LLMs. You can also refer to [Ray2333/GRM-Gemma-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma-2B-sftreg) for a better 2B reward model trained with a hidden states regularization.
@@ -30,9 +30,9 @@ import torch
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 # load model and tokenizer
-tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-llama3-8B-distill')
 reward_model = AutoModelForSequenceClassification.from_pretrained(
-                'Ray2333/GRM-llama3-8B-distill',
                 num_labels=1, torch_dtype=torch.float16,
                 device_map=0,
                 )

 ---
 # Introduction
+This is a breward model (based on Gemma-2b-it) trained with BT loss using the [weqweasdas/preference_dataset_mixture2_and_safe_pku](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku) dataset.
 This reward model is especially useful if you need a good small reward model for LLMs. You can also refer to [Ray2333/GRM-Gemma-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma-2B-sftreg) for a better 2B reward model trained with a hidden states regularization.
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 # load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
 reward_model = AutoModelForSequenceClassification.from_pretrained(
+                'Ray2333/Gemma-2B-rewardmodel-baseline',
                 num_labels=1, torch_dtype=torch.float16,
                 device_map=0,
                 )