InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Paper
•
2309.03895
•
Published
•
13
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning
Paper
•
2309.16650
•
Published
•
10
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper
•
2309.16496
•
Published
•
9
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper
•
2310.15169
•
Published
•
9
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Paper
•
2310.15008
•
Published
•
21
Matryoshka Diffusion Models
Paper
•
2310.15111
•
Published
•
41
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models
Paper
•
2310.13772
•
Published
•
6
HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Paper
•
2310.17075
•
Published
•
14
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D
Object Pose Estimation
Paper
•
2310.17359
•
Published
•
1
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
•
2310.17680
•
Published
•
70
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in
Text-to-Image Diffusion Models
Paper
•
2310.19784
•
Published
•
9
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction
Paper
•
2310.20700
•
Published
•
9
Beyond U: Making Diffusion Models Faster & Lighter
Paper
•
2310.20092
•
Published
•
11
Controllable Music Production with Diffusion Models and Guidance
Gradients
Paper
•
2311.00613
•
Published
•
25
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper
•
2311.00618
•
Published
•
21
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper
•
2311.00945
•
Published
•
14
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
•
2311.10093
•
Published
•
56
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
•
2311.09257
•
Published
•
45
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
•
2311.12052
•
Published
•
31
Diffusion Model Alignment Using Direct Preference Optimization
Paper
•
2311.12908
•
Published
•
47
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
•
2312.03793
•
Published
•
17
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
•
2312.03491
•
Published
•
33
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive
Generation
Paper
•
2312.12491
•
Published
•
69
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Paper
•
2312.13252
•
Published
•
27
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
•
2312.12490
•
Published
•
17
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
Single Image Talking Face Generation
Paper
•
2312.13578
•
Published
•
27
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
•
2402.00769
•
Published
•
22
Magic-Me: Identity-Specific Video Customized Diffusion
Paper
•
2402.09368
•
Published
•
27
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Paper
•
2402.06178
•
Published
•
13
FiT: Flexible Vision Transformer for Diffusion Model
Paper
•
2402.12376
•
Published
•
48
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper
•
2402.13763
•
Published
•
10
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
•
2402.17485
•
Published
•
190
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
•
2402.17723
•
Published
•
16
Scalable Diffusion Models with Transformers
Paper
•
2212.09748
•
Published
•
17
Paper
•
2403.03954
•
Published
•
11
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
•
2403.05135
•
Published
•
42
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
Semantic Control
Paper
•
2403.09055
•
Published
•
24
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Paper
•
2406.08659
•
Published
•
8
Note
研究者採用了擴散模型,將T2MVid生成問題分解為視角空間和時間組件,並利用預訓練的多視角圖像和2D視頻擴散模型層來確保視頻的多視角一致性和時間連續性。引入對齊模塊解決了由於2D和多視角數據之間的領域差異引起的層不兼容問題。此外,還貢獻了一個新的多視角視頻數據集。
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors
Paper
•
2406.10111
•
Published
•
6
Note
本文提出的GaussianSR方法通過引入2D生成先驗,並通過減少隨機性干擾來優化3DGS,成功實現了高品質的HRNVS,顯著超越了現有的最先進方法。這項研究為高解析度視角合成提供了一個新思路,具有重要的應用價值。
Alleviating Distortion in Image Generation via Multi-Resolution
Diffusion Models
Paper
•
2406.09416
•
Published
•
27
Note
本文提出的DiMR和TD-LN方法有效地平衡了影像細節捕捉與計算複雜度,顯著減少了影像失真,並在ImageNet生成基準測試中展示出卓越的性能,為高保真影像生成設定了新的標杆。
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
•
2406.07686
•
Published
•
14
Note
AV-DiT展示了一種高效的音視擴散變壓器架構,通過利用預訓練的圖像生成變壓器並進行輕量級的適配,實現了高質量的音視頻聯合生成。這不僅填補了現有方法的空白,還展示了多模態生成在降低計算成本和模型複雜度方面的潛力。
Repulsive Score Distillation for Diverse Sampling of Diffusion Models
Paper
•
2406.16683
•
Published
•
4
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper
•
2407.01392
•
Published
•
39
Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization
Paper
•
2308.09716
•
Published
•
2
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Paper
•
2407.16982
•
Published
•
40
Diffusion Feedback Helps CLIP See Better
Paper
•
2407.20171
•
Published
•
36
DC3DO: Diffusion Classifier for 3D Objects
Paper
•
2408.06693
•
Published
•
10
3D Gaussian Editing with A Single Image
Paper
•
2408.07540
•
Published
•
10
TurboEdit: Instant text-based image editing
Paper
•
2408.08332
•
Published
•
19