Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published 18 days ago • 18
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 29 days ago • 12
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 28 days ago • 23
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 28 days ago • 23
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 28 days ago • 23 • 3
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel Paper • 2412.08467 • Published Dec 11, 2024 • 5
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published Dec 12, 2024 • 21
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration Paper • 2410.12183 • Published Oct 16, 2024 • 3
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10, 2024 • 66
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 65
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 22
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding Paper • 2403.15377 • Published Mar 22, 2024 • 22