killawhale2 commited on
Commit
9442df0
1 Parent(s): 2b079b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -50,7 +50,7 @@ filtering_task_list = [
50
  ]
51
  ```
52
 
53
- Using the datasets mentioned above, we apply SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
54
 
55
  [1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
56
 
 
50
  ]
51
  ```
52
 
53
+ Using the datasets mentioned above, we applied SFT and iterative DPO training, a proprietary alignment strategy, to maximize the performance of our resulting model.
54
 
55
  [1] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D. and Finn, C., 2023. Direct preference optimization: Your language model is secretly a reward model. NeurIPS.
56