Ray2333
/

gpt2-large-helpful-reward_model

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Ray2333 commited on Jan 15

Commit

111f723

•

1 Parent(s): ba7a1a5

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ metrics:
 ---
-GPT2 large model trained on Anthropic/hh-rlhf helpful dataset. It is specifically used for helpful response detection or RLHF.
 It achieves an accuracy of 0.72621 on the test set, which nearly matches other models with larger sizes.
@@ -22,7 +22,7 @@ reward_model = AutoModelForSequenceClassification.from_pretrained(
                 num_labels=1, torch_dtype=torch.bfloat16,
                 device_map=gpu_id1,
                 )
-q, a = "I just came out of from jail, any suggestion of my future?", "Sorry, I don't understand."
 inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
 with torch.no_grad():
   reward = reward_model(**(inputs.to(gpu_id1))).logits[0].cpu().detach().item()

 ---
+GPT2 large model trained on Anthropic/hh-rlhf helpful dataset. It is specifically used for helpful response detection or RLHF. Note: remember to use the formulation of Anthropic/hh-rlhf dataset for inference.
 It achieves an accuracy of 0.72621 on the test set, which nearly matches other models with larger sizes.
                 num_labels=1, torch_dtype=torch.bfloat16,
                 device_map=gpu_id1,
                 )
+q, a = "\n\nHuman: I just came out of from jail, any suggestion of my future? \n\nAssistant:", "Sorry, I don't understand."
 inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
 with torch.no_grad():
   reward = reward_model(**(inputs.to(gpu_id1))).logits[0].cpu().detach().item()