Dushwe
's Collections
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper
•
2309.07749
•
Published
•
7
AudioSR: Versatile Audio Super-resolution at Scale
Paper
•
2309.07314
•
Published
•
25
Generative Image Dynamics
Paper
•
2309.07906
•
Published
•
53
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper
•
2309.06895
•
Published
•
27
Text-Guided Generation and Editing of Compositional 3D Avatars
Paper
•
2309.07125
•
Published
•
6
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion
Models
Paper
•
2309.06933
•
Published
•
12
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models
Paper
•
2309.05793
•
Published
•
50
InstaFlow: One Step is Enough for High-Quality Diffusion-Based
Text-to-Image Generation
Paper
•
2309.06380
•
Published
•
32
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance
Propagation
Paper
•
2309.00908
•
Published
•
4
Diffusion Generative Inverse Design
Paper
•
2309.02040
•
Published
•
3
Dual-Stream Diffusion Net for Text-to-Video Generation
Paper
•
2308.08316
•
Published
•
23
CoDeF: Content Deformation Fields for Temporally Consistent Video
Processing
Paper
•
2308.07926
•
Published
•
27
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Paper
•
2308.06873
•
Published
•
25
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
Diffusion Models
Paper
•
2308.06721
•
Published
•
29
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
Pretraining
Paper
•
2308.05734
•
Published
•
37
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Paper
•
2308.04079
•
Published
•
172
ConceptLab: Creative Generation using Diffusion Prior Constraints
Paper
•
2308.02669
•
Published
•
23
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with
Whitted-Style Ray Tracing
Paper
•
2308.03280
•
Published
•
6
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
Beat-Synchronous Mixup Strategies
Paper
•
2308.01546
•
Published
•
17
Computational Long Exposure Mobile Photography
Paper
•
2308.01379
•
Published
•
3
PromptStyler: Prompt-driven Style Generation for Source-free Domain
Generalization
Paper
•
2307.15199
•
Published
•
11
Interpolating between Images with Diffusion Models
Paper
•
2307.12560
•
Published
•
19
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
without Test-time Fine-tuning
Paper
•
2307.11410
•
Published
•
15
Text2Layer: Layered Image Generation using Latent Diffusion Model
Paper
•
2307.09781
•
Published
•
14
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image
Models
Paper
•
2307.06949
•
Published
•
50
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models
Paper
•
2307.06925
•
Published
•
10
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional
Text-to-image Generation
Paper
•
2307.06350
•
Published
•
6
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models
without Specific Tuning
Paper
•
2307.04725
•
Published
•
64
Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation
Paper
•
2307.03869
•
Published
•
22
SDXL: Improving Latent Diffusion Models for High-Resolution Image
Synthesis
Paper
•
2307.01952
•
Published
•
82
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape
Optimization
Paper
•
2306.16928
•
Published
•
38
DreamDiffusion: Generating High-Quality Images from Brain EEG Signals
Paper
•
2306.16934
•
Published
•
31
Generate Anything Anywhere in Any Scene
Paper
•
2306.17154
•
Published
•
21
FoleyGen: Visually-Guided Audio Generation
Paper
•
2309.10537
•
Published
•
8
FreeU: Free Lunch in Diffusion U-Net
Paper
•
2309.11497
•
Published
•
64
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper
•
2309.11499
•
Published
•
58
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper
•
2309.03897
•
Published
•
26
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion
Models
Paper
•
2309.15103
•
Published
•
42
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided
Planning
Paper
•
2309.15091
•
Published
•
32
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
•
2309.14717
•
Published
•
44
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video
Generation
Paper
•
2309.15818
•
Published
•
19
Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack
Paper
•
2309.15807
•
Published
•
32
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content
Creation
Paper
•
2309.16653
•
Published
•
46
Paper
•
2309.16609
•
Published
•
35
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper
•
2309.16496
•
Published
•
9
RealFill: Reference-Driven Generation for Authentic Image Completion
Paper
•
2309.16668
•
Published
•
14
Deep Geometrized Cartoon Line Inbetweening
Paper
•
2309.16643
•
Published
•
24
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model
Adaptation
Paper
•
2309.16429
•
Published
•
11
PixArt-α: Fast Training of Diffusion Transformer for
Photorealistic Text-to-Image Synthesis
Paper
•
2310.00426
•
Published
•
61
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion
Paper
•
2310.03502
•
Published
•
78
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Paper
•
2310.03739
•
Published
•
21
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper
•
2310.00704
•
Published
•
21
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic
Image Design and Generation
Paper
•
2310.08541
•
Published
•
17
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Paper
•
2310.08465
•
Published
•
14
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
Paper
•
2310.07697
•
Published
•
1
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video
editing
Paper
•
2310.05922
•
Published
•
4
VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation
Paper
•
2309.00398
•
Published
•
20
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Paper
•
2309.03549
•
Published
•
5
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with
Point Cloud Priors
Paper
•
2310.08529
•
Published
•
17
Reward-Augmented Decoding: Efficient Controlled Text Generation With a
Unidirectional Reward Model
Paper
•
2310.09520
•
Published
•
10
Mini-DALLE3: Interactive Text to Image by Prompting Large Language
Models
Paper
•
2310.07653
•
Published
•
2
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative
Vokens
Paper
•
2310.02239
•
Published
•
2
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Paper
•
2310.11448
•
Published
•
36
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper
•
2310.11441
•
Published
•
26
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
Paper
•
2310.10769
•
Published
•
8
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Paper
•
2310.11440
•
Published
•
15
Kosmos-G: Generating Images in Context with Multimodal Large Language
Models
Paper
•
2310.02992
•
Published
•
4
MusicAgent: An AI Agent for Music Understanding and Generation with
Large Language Models
Paper
•
2310.11954
•
Published
•
25
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture
Propagation
Paper
•
2310.13119
•
Published
•
11
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model
Statistics
Paper
•
2310.13268
•
Published
•
17
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper
•
2310.15169
•
Published
•
9
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper
•
2310.19512
•
Published
•
15
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion
Models
Paper
•
2311.04145
•
Published
•
32
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper
•
2311.05556
•
Published
•
81
Story-to-Motion: Synthesizing Infinite and Controllable Character
Animation from Long Text
Paper
•
2311.07446
•
Published
•
28
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper
•
2311.07069
•
Published
•
43
ChatAnything: Facetime Chat with LLM-Enhanced Personas
Paper
•
2311.06772
•
Published
•
35
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View
Generation and 3D Diffusion
Paper
•
2311.07885
•
Published
•
39
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality
Foundation Models
Paper
•
2311.06783
•
Published
•
26
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
Paper
•
2311.09578
•
Published
•
14
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
•
2311.10093
•
Published
•
56
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
•
2311.09257
•
Published
•
45
Single-Image 3D Human Digitization with Shape-Guided Diffusion
Paper
•
2311.09221
•
Published
•
21
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction
Model
Paper
•
2311.09217
•
Published
•
21
Drivable 3D Gaussian Avatars
Paper
•
2311.08581
•
Published
•
46
Emu Video: Factorizing Text-to-Video Generation by Explicit Image
Conditioning
Paper
•
2311.10709
•
Published
•
24
MVDream: Multi-view Diffusion for 3D Generation
Paper
•
2308.16512
•
Published
•
102
Make Pixels Dance: High-Dynamic Video Generation
Paper
•
2311.10982
•
Published
•
67
AutoStory: Generating Diverse Storytelling Images with Minimal Human
Effort
Paper
•
2311.11243
•
Published
•
14
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper
•
2311.11501
•
Published
•
33
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape
Prediction
Paper
•
2311.12024
•
Published
•
19
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
•
2311.12052
•
Published
•
31
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
Blender-Oriented GPT Planning
Paper
•
2311.12631
•
Published
•
13
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Paper
•
2311.12092
•
Published
•
21
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human
Expression
Paper
•
2311.10794
•
Published
•
24
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Paper
•
2311.13600
•
Published
•
42
LEDITS++: Limitless Image Editing using Text-to-Image Models
Paper
•
2311.16711
•
Published
•
22
MoMask: Generative Masked Modeling of 3D Human Motions
Paper
•
2312.00063
•
Published
•
15
Hierarchical Masked 3D Diffusion Model for Video Outpainting
Paper
•
2309.02119
•
Published
•
10
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human
Captures
Paper
•
2312.02963
•
Published
•
9
DragVideo: Interactive Drag-style Video Editing
Paper
•
2312.02216
•
Published
•
10
LivePhoto: Real Image Animation with Text-guided Motion Control
Paper
•
2312.02928
•
Published
•
16
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded
Diffusion Model
Paper
•
2312.02238
•
Published
•
25
FaceStudio: Put Your Face Everywhere in Seconds
Paper
•
2312.02663
•
Published
•
30
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
•
2312.03641
•
Published
•
20
DreamComposer: Controllable 3D Object Generation via Multi-View
Conditions
Paper
•
2312.03611
•
Published
•
7
Context Diffusion: In-Context Aware Image Generation
Paper
•
2312.03584
•
Published
•
14
Kandinsky 3.0 Technical Report
Paper
•
2312.03511
•
Published
•
43
Photorealistic Video Generation with Diffusion Models
Paper
•
2312.06662
•
Published
•
23
CCM: Adding Conditional Controls to Text-to-Image Consistency Models
Paper
•
2312.06971
•
Published
•
10
FreeControl: Training-Free Spatial Control of Any Text-to-Image
Diffusion Model with Any Condition
Paper
•
2312.07536
•
Published
•
16
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Paper
•
2312.07537
•
Published
•
25
StarVector: Generating Scalable Vector Graphics Code from Images
Paper
•
2312.11556
•
Published
•
27
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip
Connection Editing
Paper
•
2312.11392
•
Published
•
19
MAG-Edit: Localized Image Editing in Complex Scenarios via
Mask-Based Attention-Adjusted
Guidance
Paper
•
2312.11396
•
Published
•
10
DreamTalk: When Expressive Talking Head Generation Meets Diffusion
Probabilistic Models
Paper
•
2312.09767
•
Published
•
25
VideoLCM: Video Latent Consistency Model
Paper
•
2312.09109
•
Published
•
22
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
•
2312.12490
•
Published
•
17
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper
•
2312.14125
•
Published
•
44
Generative Multimodal Models are In-Context Learners
Paper
•
2312.13286
•
Published
•
34
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
Paper
•
2312.16693
•
Published
•
14
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper
•
2401.01256
•
Published
•
19
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper
•
2401.01952
•
Published
•
31
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation
Paper
•
2401.00896
•
Published
•
14
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
Paper
•
2401.04468
•
Published
•
48
URHand: Universal Relightable Hands
Paper
•
2401.05334
•
Published
•
22
Object-Centric Diffusion for Efficient Video Editing
Paper
•
2401.05735
•
Published
•
7
Motion-I2V: Consistent and Controllable Image-to-Video Generation with
Explicit Motion Modeling
Paper
•
2401.15977
•
Published
•
37
StableIdentity: Inserting Anybody into Anywhere at First Sight
Paper
•
2401.15975
•
Published
•
17
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
•
2402.00769
•
Published
•
22
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
Virtual Try-on
Paper
•
2403.01779
•
Published
•
28