Hao Sun's picture

7 18

Hao Sun

Holarissun

·

https://holarissun.github.io/

AI & ML interests

PhD@Uni.Cambridge. Deep RL, RL x LLM, RLHF.

Recent Activity

upvoted a paper about 1 month ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

upvoted a paper about 2 months ago

Rethinking Diverse Human Preference Learning through Principal Component Analysis

updated a model 10 months ago

Holarissun/SFT_gemma2b_hh-rlhf-helpful-gpt4_lr5e-06_epoch2-subset-1

View all activity

Organizations

None yet

Holarissun's activity

upvoted a paper about 1 month ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 99

upvoted a paper about 2 months ago

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Paper • 2502.13131 • Published Feb 18 • 36

updated 2 models 10 months ago

Holarissun/SFT_gemma2b_hh-rlhf-helpful-gpt4_lr5e-06_epoch2-subset-1

Updated Jun 17, 2024 • 1

Holarissun/SFT_gemma2b_hh-rlhf-helpful_lr5e-06_epoch2-subset-1

Updated Jun 17, 2024 • 2

liked a model 10 months ago

weqweasdas/RM-Mistral-7B

Text Classification • Updated Mar 31, 2024 • 851 • 21

updated 6 models 10 months ago

Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024 • 7

Holarissun/REPROD_dpo_harmlessharmless_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024 • 1

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06

Updated May 28, 2024 • 3

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06

Updated May 28, 2024 • 2

updated 9 models 11 months ago

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-05

Updated May 25, 2024 • 1

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr5e-05

Updated May 24, 2024 • 1

Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05

Updated May 24, 2024 • 1

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr5e-06

Updated May 24, 2024 • 1

Holarissun/REPROD_dpo_helpfulhelpful_gpt3_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05

Updated May 24, 2024 • 1 • 1

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr1e-05

Updated May 24, 2024 • 1

Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06

Updated May 24, 2024 • 1

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr1e-05

Updated May 24, 2024 • 1

Holarissun/REPROD_dpo_helpfulhelpful_gpt3_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06

Updated May 24, 2024 • 1