Papers: arxiv:2302.05543

Adding Conditional Control to Text-to-Image Diffusion Models

Lvmin Zhang ,
Maneesh Agrawala
·published on Feb 10


We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.


Sign up or log in to comment

Models citing this paper 40

Browse 40 models citing this paper

Spaces citing this paper 163