Text-to-Image
File size: 1,544 Bytes
ee7c48b
 
 
 
1c3f470
 
508ed68
1c3f470
 
 
 
 
 
 
ee7c48b
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
license: apache-2.0
---

# Flux-Mini
<div align="center">
<img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
</div>


A distilled Flux-dev model for efficient text-to-image generation



Nowadays, text-to-image (T2I) models are growing stronger but larger, which limits their practical applicability, especially on consumer-level devices. 
To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `Flux-mini` model, trying to preserve its strong image generation capabilities. 
Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`. 
The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.

We empeirically found that different blocks has different impact on the generation quality, thus we initialize the student model with several most important blocks. 
The distillation process consists of three objectives: the denoise loss, the output alignment loss and the feature alignment loss. 
The feature aligement loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model. 
The distillation process is performed with `512x512` laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`, 
and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.


github link: https://github.com/TencentARC/flux-toolkits