File size: 4,214 Bytes
6975069
80949ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6975069
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css">
   <link href="https://fonts.googleapis.com/css2?family=Montserrat:wght@500&display=swap" rel="stylesheet">
  <link rel="stylesheet" href="style.css">
  <title>M.o.f.u.</title>
</head>
<body>
  <h1 class="header-title">M.o.f.u.</h1>
   <p class="header-subtitle"><span class="highlight-orange">Mo</span>del independent, <span class="highlight-violate">F</span>ast T<span class="highlight-orange">u</span>ning of Stable Diffusion concepts</p>
  <section id="abstract">
    <h2><i class="icon fas fa-file-alt"></i> Abstract</h2>
    <p>I present MoFu, a model-independent, fast tuning
approach that enhances Stable Diffusion. Compared
to other more traditional methods, such as Low Rank
adaptation for the model or fine tuning it, MoFu
doesn’t modify the weights of the main model at all.
MoFu seamlessly integrates with Stable Diffusion's
text encoder, enabling rapid style/concept addition
without modifying or fine-tuning the encoder's
weights</p>
  </section>
  
  <section id="methodology">
    <h2><i class="icon fas fa-flask"></i> Methodology</h2>
    <p>The methodology of MoFu revolves around a simple
yet effective process. To achieve the desired results,
we begin by comparing natural prompts given to a
set of images. This comparison allows us to extract
the essential concepts or styles from the text
prompts. These identified concepts are then stored in
a mixin, creating a compact representation of the
desired style information. The mixin is designed to be
compatible with Stable Diffusion's architecture and
serves as an additive to the text encoder’s output.
By adding the mixin with the text encoder’s output
(the mixin, or MoFu model, can also be multiplied by
a weight, in order to make its effect stronger or
weaker), MoFu efficiently injects the extracted
concepts into the image generation process. This
injection enables Stable Diffusion to generate images
with the desired style without altering the underlying
weights of the main model. As a result, MoFu
provides a powerful and flexible solution for style
transfer or concept addition in Stable Diffusion
without the need for extensive model modifications
or resource-intensive fine-tuning.</p>
  </section>
  
  <section id="results">
    <h2><i class="icon fas fa-chart-bar"></i> Results</h2>
    <p>To evaluate the effectiveness of MoFu, I conducted a
series of experiments and compared its performance
to LoRA and fine-tuning methods. Our results
demonstrate that MoFu achieves comparable
performance to LoRAs while requiring significantly
less training time, taking only around 10-20 seconds
on average, primarily due to being CPU-bound. This
is in stark contrast to LoRAs, which typically demand
several hours to train. However, I also observed that
MoFu falls short of fine-tuning, as the latter can
achieve even better precision/quality but at the cost
of a much longer training.
</p>
  </section>
  
  <section id="conclusion">
    <h2><i class="icon fas fa-clipboard-check"></i> Conclusion</h2>
    <p>In conclusion, MoFu offers an efficient and
model-independent solution for adding new styles or
concepts to Stable Diffusion without modifying the
main model's weights. It achieves comparable results
to LoRA while significantly reducing training time,
making it a practical choice for rapid adaptation.
Though fine-tuning still outperforms MoFu in quality,
the trade-off between speed and accuracy makes
MoFu a valuable option for various applications.
Future work may focus on optimizing the
implementation / quality of MoFu.</p>
  </section>
  
  <footer>
    <div class="buttons">
  <a href="mailto:parsee.mizuhashi.th11@gmail.com" class="button">Yoinked</a>
  <a href="https://huggingface.co/organizations/touhou-ai-experimental" class="button">Touhou AI Experimental Group</a>
  <a href="https://huggingface.co/mofu-team" class="button">MoFu</a>
  <a href="https://github.com/yoinked-h/MoFu" class="button" target="_blank">GitHub</a>
</div>
  </footer>
</body>
</html>