This collection contains safetyQA dataset for safe SPIN training and trained models
Yifan Wang
AmberYifan
AI & ML interests
None yet
Organizations
Collections
1
models
15
AmberYifan/llama2-hhrlhf-spin-iter1
Text Generation
•
Updated
•
9
AmberYifan/llama2-hhrlhf-spin-iter0
Text Generation
•
Updated
•
92
AmberYifan/zephyr-7b-sft-safeDPO3
Text Generation
•
Updated
•
23
AmberYifan/zephyr-7b-sft-safeDPO2
Updated
AmberYifan/zephyr-7b-sft-safeDPO
Text Generation
•
Updated
•
22
AmberYifan/safe-spin-iter1-v2
Text Generation
•
Updated
•
1.75k
AmberYifan/advsafe_plus-spin-iter0
Text Generation
•
Updated
•
17
AmberYifan/advsafe-spin-iter0
Text Generation
•
Updated
•
22
AmberYifan/llama-7b-sft-DPO
Text Generation
•
Updated
•
6
AmberYifan/Mistral-7B-Instruct-v0.2-DPO
Text Generation
•
Updated
•
7
datasets
15
AmberYifan/hhrlhf-spin-iter1
Viewer
•
Updated
•
8
AmberYifan/hh-rlhf-dpo-chat
Viewer
•
Updated
•
24
AmberYifan/hh-rlhf-dpo
Viewer
•
Updated
•
10
AmberYifan/hhrlhf-spin-iter0
Viewer
•
Updated
•
16
AmberYifan/hh-rlhf-spin
Viewer
•
Updated
•
2
AmberYifan/safetyQA_DPO
Viewer
•
Updated
•
56
AmberYifan/advsafe_iter0
Viewer
•
Updated
•
16
AmberYifan/AdvBench_safe
Viewer
•
Updated
•
9
AmberYifan/spin_iter2
Viewer
•
Updated
AmberYifan/safe_spin_iter2
Viewer
•
Updated