yuxin commited on
Commit
cc3efb5
1 Parent(s): ab773a3

Upload folder using huggingface_hub (#7)

Browse files

- f4295322ed858a8fbf38688ef4f66fe4b5bc05fb7f546ae9264a472e5239f5d2 (34671fc2876506de8f6f20ea53f0bab25d591067)
- 8c9448225b6d39c35ae97c0ceffcf19f863386bf3080d08731fe757bf70fb1aa (80ea39f4383c81e642305c49d484f6564e665421)
- 54e3a9aa87dd59f13edf74bba2b3c2927cd0d30b14a35927e0492aa33f641ffa (012fd560c29ff6c4fdb977209db5f0dfc457424f)
- 9e84c48d007060e25931bf9b53241145366debd7948b69fa07bc26a73193e3e9 (34e3a0befee318ed879a2681ebd8912d2f648cf4)
- a0d6ad42d56fbe1cde7c72a9d567995eec85083a6d3fb172fd68eb8c2285878e (b4d9c6ab2c50523eee3a47044025f3d19cf8076e)
- b1744ae213ec8db6a8f4096d511a09be14bfbb99e690c06d10a0e4f560319a3b (313a325ce0a1d7fc46cc27d6f118ff6ebb4453ed)
- e2f8ff30e35595fc88c47f4940b7287a57f57b3172542305244b7551db1dda8a (42c8b2c97ec6bcdf07e1a8a239296b8acf76ef8f)
- 6fd424449d2a16ad785e7289c4a376277b3c94c6a13af7c05c8c78bd00507d2f (dcf8c5403e53a344c6ef0824632a6514344434e7)

Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -33,6 +33,7 @@ with torch.no_grad():
33
  # reward: 0.76
34
  ```
35
  模型可以较为准确地判断文本重复,异常中断和不符合指令要求等低质量模型生成结果,并给出较低的奖励值。
 
36
  The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
37
 
38
  ```python
@@ -52,8 +53,11 @@ with torch.no_grad():
52
  print(reward.tolist())
53
  #reward: [0.76, -1.36, -2.99, -1.82]
54
  ```
 
55
  模型能够对比对同一指令的不同生成结果,并根据质量给出奖励值。
 
56
  The model is able to compare different generation results for the same instruction and give reward values based on quality.
 
57
  ```python
58
  prefix_user = "Human:"
59
  prefix_bot = "\n\nAssistant:"
 
33
  # reward: 0.76
34
  ```
35
  模型可以较为准确地判断文本重复,异常中断和不符合指令要求等低质量模型生成结果,并给出较低的奖励值。
36
+
37
  The model can more accurately determine low quality model generation results such as text repetition, interruptions and failure to meet instruction requirements, and give lower reward values.
38
 
39
  ```python
 
53
  print(reward.tolist())
54
  #reward: [0.76, -1.36, -2.99, -1.82]
55
  ```
56
+
57
  模型能够对比对同一指令的不同生成结果,并根据质量给出奖励值。
58
+
59
  The model is able to compare different generation results for the same instruction and give reward values based on quality.
60
+
61
  ```python
62
  prefix_user = "Human:"
63
  prefix_bot = "\n\nAssistant:"