hendrydong commited on
Commit
d3a830f
1 Parent(s): 89579a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -4,6 +4,8 @@ license: cc-by-nc-4.0
4
 
5
  This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
6
 
 
 
7
  ## Training
8
  The base model is `meta-llama/Meta-Llama-3-8B-Instruct`.
9
 
 
4
 
5
  This reward function can be used for RLHF, including PPO, iterative SFT, iterative DPO.
6
 
7
+ The license is derived from `PKU-Alignment/PKU-SafeRLHF-30K`.
8
+
9
  ## Training
10
  The base model is `meta-llama/Meta-Llama-3-8B-Instruct`.
11