Diffusion Models Fundamental Papers (Read First)
Paper • 2302.05543 • Published • 42Note ControlNet
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Paper • 2308.06721 • Published • 29Note Add-on small 22M param model for conditioning LDM on an image prompt. How? LDM has Xatt between text and image features in each layer, IP-Adaptor adds a XAtt between VAE feats and cond. image features in each layer and adds the output of 2 types of Xatt
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 12Note Original Latent Diffusion Model. Main differences with DM: - working in latent space (frozen VAE encoder to extract features then input to the UNet, attention-based denoising model). - processing in low-dim latent space has smaller memory footprint (vs. operating in pixel space which is hi-dim) - LDM can be conditioned efficiently on prompt text - text embedded by clip text encoder, then in each layer Q comes from image (latent) features and K,V from text embedding - VAE decoder to recons. img