Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published about 18 hours ago • 7
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published 1 day ago • 16
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow Paper • 2410.07536 • Published 5 days ago • 2
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation Paper • 2410.09009 • Published 4 days ago • 11
Think While You Generate: Discrete Diffusion with Planned Denoising Paper • 2410.06264 • Published 7 days ago • 6
Mechanistic Permutability: Match Features Across Layers Paper • 2410.07656 • Published 5 days ago • 14
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published 5 days ago • 38
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models Paper • 2410.08207 • Published 5 days ago • 17
MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting Paper • 2410.07707 • Published 5 days ago • 3
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published 6 days ago • 16
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published 7 days ago • 11
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published 6 days ago • 41
Inference Scaling for Long-Context Retrieval Augmented Generation Paper • 2410.04343 • Published 9 days ago • 8
ControlAR: Controllable Image Generation with Autoregressive Models Paper • 2410.02705 • Published 12 days ago • 7
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published 12 days ago • 44
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Paper • 2410.04932 • Published 8 days ago • 8
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published 9 days ago • 26
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published 12 days ago • 35
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios Paper • 2410.01481 • Published 13 days ago • 2
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published 13 days ago • 15
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Paper • 2410.00418 • Published 14 days ago • 9
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published 18 days ago • 23
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 19 days ago • 29
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 19 days ago • 46
Self-Supervised Any-Point Tracking by Contrastive Random Walks Paper • 2409.16288 • Published 21 days ago • 5
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Paper • 2409.17058 • Published 20 days ago • 11
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published 21 days ago • 16
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published 21 days ago • 32
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Paper • 2409.16283 • Published 21 days ago • 6
MonoFormer: One Transformer for Both Diffusion and Autoregression Paper • 2409.16280 • Published 21 days ago • 17
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors Paper • 2409.15273 • Published 22 days ago • 10
Temporally Aligned Audio for Video with Autoregression Paper • 2409.13689 • Published 25 days ago • 7
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 25 days ago • 67
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published 26 days ago • 5
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published 26 days ago • 15
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer Paper • 2409.08425 • Published Sep 12 • 9
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published 27 days ago • 43
Towards Diverse and Efficient Audio Captioning via Diffusion Models Paper • 2409.09401 • Published Sep 14 • 6
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 28 days ago • 17
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 28 days ago • 27
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published Sep 13 • 30
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published Sep 12 • 15
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published Sep 12 • 15