Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Abstract
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either textual or visual concepts. Concept Sliders are plug-and-play: they can be composed efficiently and continuously modulated, enabling precise control over image generation. In quantitative experiments comparing to previous editing techniques, our sliders exhibit stronger targeted edits with lower interference. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We show how sliders can transfer latents from StyleGAN for intuitive editing of visual concepts for which textual description is difficult. We also find that our method can help address persistent quality issues in Stable Diffusion XL including repair of object deformations and fixing distorted hands. Our code, data, and trained sliders are available at https://sliders.baulab.info/
Community
This sounds extremely promising. The possibilities for creating new artistic sliders are endless. Also the technique is very effective for fixing hands and wonky objects, which is huge, especially for the low cost.
The idea in this paper was previously proposed by the open-source community early this year.
For example, this blog explains how to create the training data pairs and how to merge positive/negative loras in detail. I apologize if I missed it, but I didn't find any citations in the related works sections. If you can't read Japanese, I've translated the main idea from the blog, please check here.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing (2023)
- Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else (2023)
- ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors (2023)
- LOVECon: Text-driven Training-Free Long Video Editing with ControlNet (2023)
- Localizing and Editing Knowledge in Text-to-Image Generative Models (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
The idea in this paper was previously proposed by the open-source community early this year.
For example, this blog explains how to create the training data pairs and how to merge positive/negative loras in detail. I apologize if I missed it, but I didn't find any citations in the related works sections. If you can't read Japanese, I've translated the main idea from the blog, please check here.
Oh this is very cool! A new way to do image based training! We have not come across your blog. But now that we did, we will cite it in our next version. Thanks for letting us know!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper