RL+reason model - a zzfive Collection

zzfive 's Collections

inference optimization

RL+reason model

medical

3d

image

LLMs

video

agent

cv

audio

robot

RL+reason model

updated about 6 hours ago

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published 14 days ago • 22
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published 11 days ago • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 10 days ago • 100
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning

Paper • 2412.09858 • Published Dec 13, 2024 • 1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published 8 days ago • 51
o3-mini vs DeepSeek-R1: Which One is Safer?

Paper • 2501.18438 • Published 8 days ago • 21
s1: Simple test-time scaling

Paper • 2501.19393 • Published 7 days ago • 88
Process Reinforcement through Implicit Rewards

Paper • 2502.01456 • Published 4 days ago • 53
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

Paper • 2502.01081 • Published 4 days ago • 9
Improving Transformer World Models for Data-Efficient RL

Paper • 2502.01591 • Published 4 days ago • 8
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Paper • 2502.02508 • Published 3 days ago • 16
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published 1 day ago • 32
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published 3 days ago • 11
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

Paper • 2502.01618 • Published 4 days ago • 5