Ray2333 commited on
Commit
c8dbb02
1 Parent(s): ee6e0f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-classification
6
  ---
7
 
8
  # Introduction
9
- This is a breward model (based on Gemma-2b-it) trained with BT loss using [hendrydong/preference_700K](https://huggingface.co/datasets/hendrydong/preference_700K) dataset.
10
 
11
  This reward model is especially useful if you need a good small reward model for LLMs. You can also refer to [Ray2333/GRM-Gemma-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma-2B-sftreg) for a better 2B reward model trained with a hidden states regularization.
12
 
@@ -30,9 +30,9 @@ import torch
30
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
31
 
32
  # load model and tokenizer
33
- tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-llama3-8B-distill')
34
  reward_model = AutoModelForSequenceClassification.from_pretrained(
35
- 'Ray2333/GRM-llama3-8B-distill',
36
  num_labels=1, torch_dtype=torch.float16,
37
  device_map=0,
38
  )
 
6
  ---
7
 
8
  # Introduction
9
+ This is a breward model (based on Gemma-2b-it) trained with BT loss using the [weqweasdas/preference_dataset_mixture2_and_safe_pku](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku) dataset.
10
 
11
  This reward model is especially useful if you need a good small reward model for LLMs. You can also refer to [Ray2333/GRM-Gemma-2B-sftreg](https://huggingface.co/Ray2333/GRM-Gemma-2B-sftreg) for a better 2B reward model trained with a hidden states regularization.
12
 
 
30
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
31
 
32
  # load model and tokenizer
33
+ tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
34
  reward_model = AutoModelForSequenceClassification.from_pretrained(
35
+ 'Ray2333/Gemma-2B-rewardmodel-baseline',
36
  num_labels=1, torch_dtype=torch.float16,
37
  device_map=0,
38
  )