Submitted by akhaliq 66 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing · 7 authors 5
Submitted by yulunliu 51 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing · 6 authors 2
Submitted by myownskyW7 40 MotionClone: Training-Free Motion Cloning for Controllable Video Generation · 9 authors 4
Submitted by Liuff23 37 Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion · 6 authors 4
Submitted by yixinsong 37 PowerInfer-2: Fast Large Language Model Inference on a Smartphone · 6 authors 5
Submitted by lixin4ever 35 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs · 11 authors 2
Submitted by jedyang97 29 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination · 7 authors 2
Submitted by akhaliq 27 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos · 14 authors
Submitted by yixinsong 26 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters · 7 authors 2
Submitted by GlyphByT5 20 FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation · 8 authors
Submitted by chrlu 16 Discovering Preference Optimization Algorithms with and for Large Language Models · 7 authors
Submitted by akhaliq 16 AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation · 5 authors
Submitted by akhaliq 15 Hierarchical Patch Diffusion Models for High-Resolution Video Generation · 4 authors
Submitted by yifanzhang114 13 Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models · 7 authors 2
Submitted by chrisliu298 9 Large Language Model Unlearning via Embedding-Corrupted Prompts · 4 authors
Submitted by AliBehrouz 9 Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models · 3 authors 1