EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published 3 days ago • 11
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published 2 days ago • 6
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting Paper • 2405.19957 • Published 2 days ago • 3
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published 2 days ago • 9
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published 2 days ago • 9
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 4 days ago • 15
Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining Paper • 2405.14908 • Published 9 days ago • 10
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published 8 days ago • 19
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published 8 days ago • 11
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published 8 days ago • 41
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published 11 days ago • 20
Dynamic data sampler for cross-language transfer learning in large language models Paper • 2405.10626 • Published 15 days ago • 3
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published 18 days ago • 10
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 18 days ago • 25
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published 18 days ago • 17
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published 19 days ago • 19
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models Paper • 2405.09220 • Published 17 days ago • 22
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published 30 days ago • 44
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge Paper • 2405.00263 • Published May 1 • 13
MicroDreamer: Zero-shot 3D Generation in sim20 Seconds by Score-based Iterative Reconstruction Paper • 2404.19525 • Published Apr 30 • 8
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model Paper • 2404.19759 • Published Apr 30 • 22
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published Apr 24 • 16
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper • 2404.16790 • Published Apr 25 • 7
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published Apr 25 • 18
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 49
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published Apr 22 • 17
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22 • 37
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 51
How Far Can We Go with Practical Function-Level Program Repair? Paper • 2404.12833 • Published Apr 19 • 6
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 27
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19 • 38
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models Paper • 2404.13013 • Published Apr 19 • 26
Transferable and Principled Efficiency for Open-Vocabulary Segmentation Paper • 2404.07448 • Published Apr 11 • 10
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 41
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 18
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation Paper • 2404.05674 • Published Apr 8 • 11
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models Paper • 2404.04478 • Published Apr 6 • 11
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators Paper • 2404.05014 • Published Apr 7 • 23
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8 • 22
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Paper • 2404.04167 • Published Apr 5 • 8
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 102
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 28