new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

May 23

Submitted by

BoZhang

NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

·
25 authors

Submitted by

Xiaoye08

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

·
5 authors

3

Submitted by

dongguanting

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

·
10 authors

Submitted by

wenhu

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

·
5 authors

2

Submitted by

Liang0223

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

·
10 authors

Submitted by

DongfuJiang

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

·
5 authors

Submitted by

gogoduan

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

·
8 authors

Submitted by

yyyou

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

·
8 authors

Submitted by

ChenyuZheng

Scaling Diffusion Transformers Efficiently via μP

·
8 authors

2

Submitted by

i-udovichenko

Risk-Averse Reinforcement Learning with Itakura-Saito Loss

·
5 authors

2

Submitted by

Franck-Dernoncourt

Understanding Generative AI Capabilities in Everyday Image Editing Tasks

·
7 authors

2

Submitted by

ychenNLP

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

·
8 authors

Submitted by

yanyc

Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning

·
10 authors

Submitted by

tricktreat

Let LLMs Break Free from Overthinking via Self-Braking Tuning

·
10 authors

Submitted by

taesiri

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

·
5 authors

2

Submitted by

rp-yu

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

·
3 authors

Submitted by

XuankunRong

Backdoor Cleaning without External Guidance in MLLM Fine-tuning

·
8 authors

Submitted by

nthakur

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

·
4 authors

Submitted by

julianjuaner

Training-Free Efficient Video Generation via Dynamic Token Carving

·
9 authors

Submitted by

KaituoFeng

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

·
5 authors

Submitted by

weizhepei

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

·
12 authors

Submitted by

haoningwu

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

·
6 authors

Submitted by

jacklishufan

LaViDa: A Large Diffusion Language Model for Multimodal Understanding

·
10 authors

Submitted by

zhangchenxu

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

·
7 authors

Submitted by

KevinQHLin

Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

·
4 authors

Submitted by

Siyuanyuan

OViP: Online Vision-Language Preference Learning

·
6 authors

Submitted by

xw-eric

GRIT: Teaching MLLMs to Think with Images

·
9 authors

Submitted by

hcwei

Training-Free Reasoning and Reflection in MLLMs

·
2 authors

Submitted by

Kikkk

AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

·
8 authors

Submitted by

ilgee

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

·
11 authors

Submitted by

xhyandwyy

VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

·
9 authors

Submitted by

ryokamoi

Training Step-Level Reasoning Verifiers with Formal Verification Tools

·
5 authors

Submitted by

RunsenXu

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

·
9 authors

Submitted by

xw-eric

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

·
7 authors

Submitted by

sagnikM

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

·
4 authors

2

Submitted by

MING-ZCH

Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework

·
2 authors

Submitted by

keplerccc

Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

·
4 authors

2

Submitted by

ingeol

How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads

·
4 authors

Submitted by

jaagli

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

·
11 authors

Submitted by

berkegokmen1

RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

·
4 authors

2

Submitted by

gsarti

Steering Large Language Models for Machine Translation Personalization

·
5 authors

2

Submitted by

ayyyq

When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction

·
2 authors

2

Submitted by

gagan3012

Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

·
3 authors

Submitted by

reachomk

gen2seg: Generative Models Enable Generalizable Instance Segmentation

·
2 authors

2

Submitted by

seyoungsong

MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

·
7 authors

2

Submitted by

philippds

SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution

·
1 authors

2

Submitted by

zenyn

SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information

·
4 authors

2