Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Paper • 2405.19332 • Published May 29, 2024 • 22
ShenaoZ/0.0005_zephyr_withdpo_5551_4iters_bs256_newtrl_iter_1 Feature Extraction • Updated May 13, 2024 • 6
ShenaoZ/0.0005_zephyr_withdpo_5551_4iters_bs256_newtrl_iter_2 Feature Extraction • Updated May 13, 2024 • 6
ShenaoZ/0.0005_zephyr_withdpo_5551_4iters_bs256_newtrl_iter_3 Feature Extraction • Updated May 13, 2024 • 5
ShenaoZ/0.0005_zephyr_withdpo_5551_4iters_bs256_newtrl_iter_4 Feature Extraction • Updated May 13, 2024 • 7
ShenaoZ/0.0005_mistral_withdpo_4iters_bs256_5551lr_dataset Viewer • Updated May 10, 2024 • 51.8k • 1.7k
ShenaoZ/0.0005_withdpo_4iters_bs256_2epoch_5551lr_dataset Viewer • Updated May 10, 2024 • 51.8k • 2.18k