Haoxiang-Wang commited on
Commit
c731b39
1 Parent(s): 76dec03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -34,11 +34,11 @@ This is the model card of a 🤗 transformers model that has been pushed on the
34
 
35
  Use the code below to get started with the model.
36
 
37
-
38
  + System Prompt:
39
  + Template: `"You are a helpful, respectful, and honest assistant who always responds to the user in a harmless way. Your response should maximize weighted rating = helpfulness*{weight_helpfulness} + verbosity*{weight_verbosity}"`
40
  + Value Choices: `weight_helpfulness` is an integer from 0 to 100 and `(weight_verbosity/100)**2 + (weight_helpfulness/100)**2 == 1`
41
  + The maximum `weight_helpfulness` is 100 the lowest suggested value is 71.
 
42
 
43
  We suggest starting with a ratio of `weight_verbosity/weight_helpfulness` first. For instance, considering `weight_verbosity/weight_helpfulness` is equal to `tan(-15°)`
44
  ```python
@@ -47,7 +47,7 @@ import torch
47
  import numpy as np
48
 
49
  # Here we show how to use the DPA model to generate a response to a user prompt.
50
- device = "cuda:2"
51
  model = AutoModelForCausalLM.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B", torch_dtype=torch.bfloat16, device_map=device)
52
  tokenizer = AutoTokenizer.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B")
53
  degree = -15 # weight_verbosity/weight_helpfulness = tan(-15°)
 
34
 
35
  Use the code below to get started with the model.
36
 
 
37
  + System Prompt:
38
  + Template: `"You are a helpful, respectful, and honest assistant who always responds to the user in a harmless way. Your response should maximize weighted rating = helpfulness*{weight_helpfulness} + verbosity*{weight_verbosity}"`
39
  + Value Choices: `weight_helpfulness` is an integer from 0 to 100 and `(weight_verbosity/100)**2 + (weight_helpfulness/100)**2 == 1`
40
  + The maximum `weight_helpfulness` is 100 the lowest suggested value is 71.
41
+ + The model will generate a response that implicitly maximizes the weighted rating `helpfulness*weight_helpfulness + verbosity*weight_verbosity`, where `helpfulness` and `verbosity` are two reward objectives that range from 0 to 100.
42
 
43
  We suggest starting with a ratio of `weight_verbosity/weight_helpfulness` first. For instance, considering `weight_verbosity/weight_helpfulness` is equal to `tan(-15°)`
44
  ```python
 
47
  import numpy as np
48
 
49
  # Here we show how to use the DPA model to generate a response to a user prompt.
50
+ device = "cuda"
51
  model = AutoModelForCausalLM.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B", torch_dtype=torch.bfloat16, device_map=device)
52
  tokenizer = AutoTokenizer.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B")
53
  degree = -15 # weight_verbosity/weight_helpfulness = tan(-15°)