1 12 1

Dian Yu

yudian

https://scholar.google.com/citations?user=ERdzqyYAAAAJ&hl=en

AI & ML interests

NLP

Recent Activity

authored a paper 21 days ago

Expanding RL with Verifiable Rewards Across Diverse Domains

upvoted a paper 21 days ago

Expanding RL with Verifiable Rewards Across Diverse Domains

upvoted a collection 22 days ago

RLVR

View all activity

Organizations

None yet

yudian's activity

authored a paper 21 days ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published 22 days ago • 19

upvoted a paper 21 days ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published 22 days ago • 19

upvoted a collection 22 days ago

RLVR

Collection

Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated 22 days ago • 11

authored 3 papers about 1 month ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 42

OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas

Paper • 2501.15427 • Published Jan 26 • 6

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Paper • 2502.16852 • Published Feb 24

upvoted a paper 3 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61

authored a paper 3 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 61

upvoted a paper 4 months ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 42

upvoted a paper 6 months ago

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published Oct 4, 2024 • 12

liked a model 8 months ago

deepseek-ai/DeepSeek-Prover-V1.5-RL

Updated Aug 29, 2024 • 3.62k • 57

upvoted a collection 10 months ago

Reinforcement Learning (RL / RLHF)

Collection

19 items • Updated Oct 22, 2024 • 1

authored 3 papers 10 months ago

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 40

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 102

upvoted 3 papers 10 months ago

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 40

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Paper • 2407.00617 • Published Jun 30, 2024 • 7

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 102

authored 2 papers 10 months ago

DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

Paper • 1902.00164 • Published Feb 1, 2019

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

Paper • 1904.09679 • Published Apr 21, 2019