Submitted by shizhediao 71 CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training · 15 authors 2
Submitted by davidchan 29 Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling · 6 authors 2
Submitted by BestWishYsh 24 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation · 2 authors 2
Submitted by QizhiPei 23 A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis · 8 authors 2
Submitted by nielsr 17 Perception Encoder: The best visual embeddings are not at the output of the network · 18 authors 2
Submitted by ahmed-masry 17 ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering · 14 authors 2
Submitted by Harold328 16 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models · 8 authors 4
Submitted by dreamerdeo 13 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation · 8 authors 2
Submitted by sthuihui 12 DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging · 7 authors 3
Submitted by janghyuncho7 11 PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding · 29 authors 2
Submitted by wanghaofan 11 InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework · 12 authors 2
Submitted by LeanQuant 8 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float · 6 authors 2
Submitted by dongyong2 7 CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy · 5 authors 2
Submitted by cihangxie 4 Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark · 6 authors 2
Submitted by Shilin-LU 3 Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts · 4 authors 2
Submitted by hriaz 2 MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation · 5 authors 2
Submitted by WY123L 1 Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking · 7 authors 2