9 81 169

YangWang92

yangwang92

AI & ML interests

None yet

Recent Activity

liked a model about 21 hours ago

microsoft/bitnet-b1.58-2B-4T

liked a model 3 days ago

Skywork/Skywork-OR1-32B-Preview

liked a dataset 3 days ago

AI-MO/NuminaMath-1.5

View all activity

Organizations

yangwang92's activity

upvoted a paper 4 days ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 14 days ago • 72

upvoted a paper 11 days ago

Inference-Time Scaling for Generalist Reward Modeling

Paper • 2504.02495 • Published 13 days ago • 52

upvoted a paper about 1 month ago

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published Mar 5 • 39

upvoted a collection about 1 month ago

Qwen2.5-1M

Collection

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Feb 26 • 116

upvoted 2 papers about 2 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 181

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published Feb 14 • 55

upvoted a collection 2 months ago

CodeI/O

Collection

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated Feb 13 • 6

upvoted a paper 2 months ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 48

upvoted an article 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 841

upvoted 2 papers 2 months ago

Matryoshka Quantization

Paper • 2502.06786 • Published Feb 10 • 30

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

Paper • 2502.05003 • Published Feb 7 • 44

upvoted a collection 2 months ago

Reasoning Datasets

Collection

Distilled synthetic Reasoning datasets • 7 items • Updated Feb 2 • 60

upvoted 5 papers 3 months ago

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 48

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 113

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 381