11 14 42

nathan lile

nlile

https://NathanThinks.com

AI & ML interests

None yet

Recent Activity

updated a model about 19 hours ago

nlile/policy_iteration_2

updated a model about 19 hours ago

nlile/policy_iteration_2

updated a model about 19 hours ago

nlile/policy_iteration_2

View all activity

Organizations

nlile's activity

upvoted a collection about 2 months ago

Big-Math

Collection

This collection contains assets associated with the Big-Math dataset, a high-quality collection of over 250,000 math questions with verifiable answers • 4 items • Updated 8 days ago • 4

upvoted 4 papers about 2 months ago

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Paper • 2502.17387 • Published Feb 24 • 6

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 38

Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25 • 48

FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users

Paper • 2502.19312 • Published Feb 26 • 7

upvoted a paper 3 months ago

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 98

upvoted 2 papers 5 months ago

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 124

Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 16

upvoted a paper 6 months ago

Generative Reward Models

Paper • 2410.12832 • Published Oct 2, 2024 • 6

upvoted a collection 6 months ago

PERSONA

Collection

Collection of various datasets related to the PERSONA paper. • 5 items • Updated 8 days ago • 3

upvoted a collection 7 months ago

NuminaMath

Collection

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated Feb 10 • 77

upvoted a paper 9 months ago

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 20

upvoted an article 11 months ago

Article

The N Implementation Details of RLHF with PPO

Oct 24, 2023

• 50

upvoted a paper about 1 year ago

Suppressing Pink Elephants with Direct Principle Feedback

Paper • 2402.07896 • Published Feb 12, 2024 • 11