Submitted by cg1177 60 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models · 21 authors 5
Submitted by salmannyu 29 X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents · 10 authors 2
Submitted by mpark 27 SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation · 5 authors 2
Submitted by wchengad 23 StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians · 10 authors 2
Submitted by saxon 23 THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models · 4 authors 2
Submitted by Ningyu 19 EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models · 10 authors 2
Submitted by frog123123123123 18 Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs · 10 authors 2
Submitted by Swtheking 18 LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs · 8 authors 2
Submitted by ewrfcas 14 Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation · 8 authors 2
Submitted by pengxiang 12 InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners · 8 authors 2
Submitted by Njb 10 DRAGON: Distributional Rewards Optimize Diffusion Generative Models · 4 authors 2
Submitted by Yuxiang007 10 LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark · 9 authors 2
Submitted by bys0318 9 An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes · 7 authors 3
Submitted by manuelkansy 8 LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping · 5 authors 6
Submitted by quyanh 6 RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search · 3 authors 8
Submitted by SieraL 6 NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning · 11 authors 4
Submitted by RanjanSapkota 4 RF-DETR Object Detection vs YOLOv12 : A Study of Transformer-based and CNN-based Architectures for Single-Class and Multi-Class Greenfruit Detection in Complex Orchard Environments Under Label Ambiguity · 4 authors 2
Submitted by reyavir 3 PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines · 5 authors 2
Submitted by ChenWu98 1 Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction · 4 authors 2
Submitted by nielsr 1 LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models · 5 authors 2
Submitted by tnngo2 1 SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging · 6 authors 2