arxiv:2503.10637

Distilling Diversity and Control in Diffusion Models

Published on Mar 13

· Submitted by

RohitGandikota on Mar 14

Upvote

Authors:

Rohit Gandikota ,

Abstract

Distilled diffusion models suffer from a critical limitation: reduced sample diversity compared to their base counterparts. In this work, we uncover that despite this diversity loss, distilled models retain the fundamental concept representations of base models. We demonstrate control distillation - where control mechanisms like Concept Sliders and LoRAs trained on base models can be seamlessly transferred to distilled models and vice-versa, effectively distilling control without any retraining. This preservation of representational structure prompted our investigation into the mechanisms of diversity collapse during distillation. To understand how distillation affects diversity, we introduce Diffusion Target (DT) Visualization, an analysis and debugging tool that reveals how models predict final outputs at intermediate steps. Through DT-Visualization, we identify generation artifacts, inconsistencies, and demonstrate that initial diffusion timesteps disproportionately determine output diversity, while later steps primarily refine details. Based on these insights, we introduce diversity distillation - a hybrid inference approach that strategically employs the base model for only the first critical timestep before transitioning to the efficient distilled model. Our experiments demonstrate that this simple modification not only restores the diversity capabilities from base to distilled models but surprisingly exceeds it, while maintaining nearly the computational efficiency of distilled inference, all without requiring additional training or model modifications. Our code and data are available at https://distillation.baulab.info

View arXiv page View PDF Project page GitHub repository Add to collection

Community

RohitGandikota

Paper author Paper submitter about 12 hours ago

Understanding the role of a timestep in diffusion model has been challenging due to many reasons. We propose using the x^ prediction variant called DT Visualization (Diffusion Target). This visualization helps understand what the model "thinks" at every timestep!

Using DT visualization - we discovered that distilled models have mode collapse due to the disproportionate usage of timesteps and we propose a simply way to fix this - no training!!!

Our hybrid training-free approach makes distilled models way more diverse - as diverse as (in fact slightly more) than the base models while maintaining same fast inference speeds.

Project Page: https://distillation.baulab.info
Code: https://github.com/rohitgandikota/distillation

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.10637 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.10637 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.10637 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.