LLM360
/

AmberSafe

@@ -79,6 +79,9 @@ python3 -m fastchat.serve.cli --model-path LLM360/AmberSafe
 | [PKU-Alignment/PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)    | 330k        | cc-by-nc-4.0 |
 | Total | 330k |  |
 ## Method
 We followed the instructions in the [dpo repo](https://github.com/eric-mitchell/direct-preference-optimization) to finetune this model.

 | [PKU-Alignment/PKU-SafeRLHF](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)    | 330k        | cc-by-nc-4.0 |
 | Total | 330k |  |
+## Data Preprocessing
+We filtered the dataset by selecting all data samples with different boolean values in `is_response_0_safe` and `is_response_1_safe`. This would make sure that for each pair in the preference dataset, the chosen text is safe and the rejected one is unsafe.
 ## Method
 We followed the instructions in the [dpo repo](https://github.com/eric-mitchell/direct-preference-optimization) to finetune this model.