Big thanks to
To make this demo as good as possible, our team spend a lot of time training a custom model. We used the LAION5B dataset to build our custom dataset, which contains 130k images of 15 types of rooms in almost 30 design styles. After fetching all these images, we started adding metadata such as captions (from the BLIP captioning model) and segmentation maps (from the HuggingFace UperNetForSemanticSegmentation model).
This dataset was then used to train the controlnet model to generate quality interior design images by using the segmentation maps and prompts as conditioning information for the model. By training on segmentation maps, the end user has a very finegrained control over which objects they want to place in their room.
The training started from the
lllyasviel/control_v11p_sd15_seg checkpoint, which is a robustly trained controlnet model conditioned on segmentation maps. This checkpoint got fine-tuned on a TPUv4 with the JAX framework. Afterwards, the checkpoint was converted into a PyTorch checkpoint for easy integration with the diffusers library.
Our team made a streamlit demo where you can test out the capabilities of this model. The resulting model is used in a community pipeline that supports image2image and inpainting, so the user can keep elements of their room and change specific parts of the image. https://huggingface.co/spaces/controlnet-interior-design/controlnet-seg