Peng Wang

stillarrow

AI & ML interests

None yet

Recent Activity

liked a model 6 days ago

meta-llama/Llama-3.2-11B-Vision

liked a dataset 8 days ago

GAIR/LIMO

liked a dataset 10 days ago

AI-MO/NuminaMath-CoT

View all activity

Organizations

None yet

stillarrow's activity

upvoted an article 12 days ago

Article

Open R1: Update #2

and 6 others •

14 days ago

• 184

upvoted a collection 17 days ago

OpenMath

Collection

A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" • 15 items • Updated Jan 17 • 42

upvoted 2 papers about 1 month ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 91

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

Paper • 2410.02884 • Published Oct 3, 2024 • 54

upvoted a paper about 2 months ago

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 80

upvoted a paper 7 months ago

Llemma: An Open Language Model For Mathematics

Paper • 2310.10631 • Published Oct 16, 2023 • 53

upvoted 3 papers 8 months ago

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Paper • 2407.04078 • Published Jul 4, 2024 • 18

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 77

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29, 2024 • 38

upvoted a collection 9 months ago

Reward models on the hub

Collection

UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13, 2024 • 25

upvoted 2 papers 11 months ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64

The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

Paper • 2312.01552 • Published Dec 4, 2023 • 33

upvoted 3 papers about 1 year ago

upvoted a collection about 1 year ago

Awesome feedback datasets

Collection

A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68

upvoted a paper about 1 year ago

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 244

upvoted a paper over 1 year ago

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 35