Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published 10 days ago • 43
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 10 days ago • 19
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy Paper • 2203.07845 • Published Mar 15, 2022
Video Background Music Generation with Controllable Music Transformer Paper • 2111.08380 • Published Nov 16, 2021 • 1
Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need Paper • 2302.02615 • Published Feb 6, 2023
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Paper • 2412.09593 • Published 10 days ago • 17
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published Sep 17 • 25
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 53
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 44
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance Paper • 2306.00943 • Published Jun 1, 2023 • 5
Real-World Image Variation by Aligning Diffusion Inversion Chain Paper • 2305.18729 • Published May 30, 2023 • 4