Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published 5 days ago • 18
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published 5 days ago • 66
Vivid-ZOO: Multi-View Video Generation with Diffusion Model Paper • 2406.08659 • Published 6 days ago • 7
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors Paper • 2406.10111 • Published 5 days ago • 6
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published 6 days ago • 17
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published 7 days ago • 19
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published 6 days ago • 17
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 6 days ago • 26
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published 9 days ago • 21
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 9 days ago • 33
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 11 days ago • 38
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published 8 days ago • 10
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 8 days ago • 49
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models Paper • 2406.07472 • Published 8 days ago • 9
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published 11 days ago • 11
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Paper • 2406.06216 • Published 9 days ago • 14
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 13 days ago • 36
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 13 days ago • 26
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published 19 days ago • 9
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 19 days ago • 60
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published 21 days ago • 12
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published 21 days ago • 20
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published 26 days ago • 14
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper • 2405.17258 • Published 23 days ago • 12
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published 23 days ago • 11
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published 30 days ago • 23
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published May 18 • 11
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published about 1 month ago • 53
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 49
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 25
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30 • 20
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 64
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 69
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning Paper • 2404.15449 • Published Apr 23 • 11
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published Apr 24 • 16
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 56
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
SnapKV: LLM Knows What You are Looking for Before Generation Paper • 2404.14469 • Published Apr 22 • 23
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Paper • 2404.13208 • Published Apr 19 • 38
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 239
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models Paper • 2404.07724 • Published Apr 11 • 10
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 97