DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published Nov 21 • 13
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4 • 25
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4 • 33
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30 • 46
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published Oct 29 • 16
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs Paper • 2311.04901 • Published Nov 8, 2023 • 7