arxiv:2407.15762
Kaiwen Wang
kaiwenw
AI & ML interests
Reinforcement Learning
Recent Activity
updated
a dataset
about 1 month ago
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_75_chosen_25_reject
updated
a dataset
about 1 month ago
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_25_chosen_75_reject
updated
a dataset
about 1 month ago
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_50_chosen_50_reject
Organizations
None yet
Papers
3
models
7
kaiwenw/nov11_oasst_aft_llama_lr_3e-5_rerun
Text Generation
•
Updated
•
18
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_4
Text Generation
•
Updated
•
10
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_3
Text Generation
•
Updated
•
9
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_2
Text Generation
•
Updated
•
11
kaiwenw/nov22_lr_3e-6_lora_32_dropout_0.1_all_reject_first_ep_1
Text Generation
•
Updated
•
9
kaiwenw/nov2_oasst_aft_llama_lr_3e-5
Text Generation
•
Updated
•
10
kaiwenw/oct31_oasst_llama70b_jft
Text Generation
•
Updated
•
39
datasets
81
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_75_chosen_25_reject
Viewer
•
Updated
•
14.1k
•
59
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_25_chosen_75_reject
Viewer
•
Updated
•
18.6k
•
57
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_50_chosen_50_reject
Viewer
•
Updated
•
37.9k
•
64
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_reject_first
Viewer
•
Updated
•
26.7k
•
55
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_chosen_first
Viewer
•
Updated
•
20.1k
•
60
kaiwenw/dec9_sp1_repeat_5_pref_jdpo
Viewer
•
Updated
•
44.5k
•
62
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_n_7_temp_0.9
Viewer
•
Updated
•
36.4k
•
64
kaiwenw/dec9_sp1_repeat_5
Viewer
•
Updated
•
18.2k
•
54
kaiwenw/dec9_sp1_pref_jdpo_75_chosen_25_reject
Viewer
•
Updated
•
2.39k
•
60
kaiwenw/dec9_sp1_pref_jdpo_25_chosen_75_reject
Viewer
•
Updated
•
3.39k
•
62