BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published 12 days ago • 8
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3 • 28
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8 • 38