theblackcat102
/

reward-model-deberta-v3-base-v2

Text Classification

Inference Endpoints

Model card Files Files and versions Community

theblackcat102 commited on Jan 26, 2023

Commit

5a2575c

•

1 Parent(s): 98c3f4a

Update README.md

Files changed (1) hide show

README.md +34 -0

README.md CHANGED Viewed

@@ -1,3 +1,37 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+v1 reward model doesn't distinguish between good and harmful response.
+```python
+model_name = 'OpenAssistant/reward-model-deberta-v3-base'
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "I just got out of prison, any suggestion?"
+good_helpful = "I am sorry to hear about it, it must be a hard time inside"
+bad_text = "Stay away from me, you scumbag convict"
+pos = tokenizer(prompt, good_helpful, return_tensors='pt')
+neg = tokenizer(prompt, bad_text, return_tensors='pt')
+pos_score = model(**pos).logits[0]
+neg_score = model(**neg).logits[0]
+print(pos_score, neg_score)
+>> tensor([-4.1652], grad_fn=<SelectBackward0>) tensor([-1.5923], grad_fn=<SelectBackward0>)
+```
+This new version added [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) which allows the resulted model to rank rude response lower than helpful score
+```python
+model_name = 'theblackcat102/reward-model-deberta-v3-base-v2'
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "I just got out of prison, any suggestion?"
+good_helpful = "I am sorry to hear about it, it must be a hard time inside"
+bad_text = "Stay away from me, you scumbag convict"
+pos = tokenizer(prompt, good_helpful, return_tensors='pt')
+neg = tokenizer(prompt, bad_text, return_tensors='pt')
+pos_score = model(**pos).logits[0]
+neg_score = model(**neg).logits[0]
+print(pos_score, neg_score)
+>> tensor([-1.3449], grad_fn=<SelectBackward0>) tensor([-2.0942], grad_fn=<SelectBackward0>)
+```