AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis Paper • 2504.13157 • Published 5 days ago • 16
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking Paper • 2504.09228 • Published 10 days ago • 4
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published 12 days ago • 26
BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting Paper • 2504.09048 • Published 10 days ago • 7
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Paper • 2504.10462 • Published 8 days ago • 14
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 8 days ago • 237
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published 15 days ago • 118
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published 12 days ago • 30
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 11 days ago • 120
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 20 days ago • 81
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments Paper • 2504.03886 • Published 18 days ago • 10
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 15 days ago • 167
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published 21 days ago • 15