41 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments · 17 authors 1
28 Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models · 11 authors 2
15 From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples · 4 authors 1
10 Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models · 6 authors 1