9 10 19

Shengyi Costa Huang

vwxyzjn

http://costa.sh

AI & ML interests

None yet

Articles

Organizations

Collections 3

Papers 5

spaces 3

Runtime error

🔥

Aim

Sleeping

😻

Vwxyzjn Testyes4

Runtime error

📊

Pyserini Wikipedia Kilt Doc

models 389

vwxyzjn/rm_zephyr_new

Text Classification • Updated Sep 26 • 18

vwxyzjn/online_dpo_vllm_thread_beta_0.03__allenai_open_instruct_dev

Updated Sep 11

vwxyzjn/reward_modeling__EleutherAI_pythia-14m

Updated Aug 24 • 15

vwxyzjn/online_dpo_vllm__vwxyzjn_btulu

Updated Aug 23

vwxyzjn/online_dpo_vllm__allenai_llama-3-tulu-2-8b

Updated Aug 19 • 6

vwxyzjn/btulu

Text Generation • Updated Aug 19 • 369

vwxyzjn/online_dpo_tulu_2

Text Generation • Updated Aug 19 • 9

vwxyzjn/gkd-model

Updated Aug 15

vwxyzjn/reward_modeling__allenai_llama-3-tulu-2-8b

Updated Aug 11 • 39

vwxyzjn/online_dpo__cleanrl_EleutherAI_pythia-1b-dedupedsfttldr

Updated Aug 9

datasets 282

vwxyzjn/norobot_pref_4860

Viewer • Updated 29 days ago • 59.9k • 36

vwxyzjn/norobot_generation_4860

Viewer • Updated 29 days ago • 29.9k • 16

vwxyzjn/norobot_pref_465

Viewer • Updated 29 days ago • 59.4k • 24

vwxyzjn/norobot_generation_465

Viewer • Updated 29 days ago • 29.7k • 8

vwxyzjn/norobot_generation_16325

Viewer • Updated 29 days ago • 29.7k • 12

vwxyzjn/norobot_pref_11421

Viewer • Updated 29 days ago • 56.1k • 9

vwxyzjn/norobot_generation_11421

Viewer • Updated 29 days ago • 28k • 12

vwxyzjn/rejection_sampling_scores_1727889563

Viewer • Updated 29 days ago • 240 • 7

vwxyzjn/rejection_sampling_1727889563

Viewer • Updated 29 days ago • 60 • 10

vwxyzjn/rejection_sampling_scores_1727889130

Viewer • Updated 29 days ago • 180 • 10

Shengyi Costa Huang

AI & ML interests

Articles

How NuminaMath Won the 1st AIMO Progress Prize

Preference Optimization for Vision Language Models

Putting RL back in RLHF

Constitutional AI with Open LLMs

The N Implementation Details of RLHF with PPO

Organizations

Collections 3

Papers 5

spaces 3 Sort: Recently updated

Aim

Vwxyzjn Testyes4

Pyserini Wikipedia Kilt Doc

models 389 Sort: Recently updated

datasets 282 Sort: Recently updated

spaces 3

models 389

datasets 282