ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 7 days ago • 64
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting Paper • 2411.17223 • Published 7 days ago • 5
TEXGen: a Generative Diffusion Model for Mesh Textures Paper • 2411.14740 • Published 12 days ago • 13
Learning 3D Representations from Procedural 3D Programs Paper • 2411.17467 • Published 8 days ago • 8
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Paper • 2411.17686 • Published 7 days ago • 18
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 8 days ago • 42
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 18 days ago • 61
Number it: Temporal Grounding Videos like Flipping Manga Paper • 2411.10332 • Published 18 days ago • 12
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published 26 days ago • 22
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? Paper • 2411.05000 • Published 26 days ago • 21
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper • 2411.04952 • Published 26 days ago • 27
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published 28 days ago • 25
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 26 days ago • 109
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published 26 days ago • 70
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 26 days ago • 48