RunDiffusion commited on
Commit
313c66b
·
verified ·
1 Parent(s): 21d846c

About done

Browse files
Files changed (1) hide show
  1. README.md +75 -173
README.md CHANGED
@@ -63,8 +63,6 @@ widget:
63
  license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
64
  ---
65
 
66
-
67
-
68
  <div style="display: flex; align-items: center; justify-content: space-between;"> <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/RD-Logo-dark.jpg" alt="Left Image" style="width: 30%;">
69
  <p style="text-align: center; width: 40%;">
70
  <span style="font-weight: bold; font-size: 1.5em;">Flux Training Concept - Wonderman POC</span><br>
@@ -92,7 +90,7 @@ Flux thinks that "Wonderman" is "Superman"
92
 
93
 
94
  ## Data Used for Training
95
- You can view the [RAW low quality data here: ](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Raw%20Low%20Quality%20Data).
96
  The training data was low resolution, cropped, oddly shaped, pixelated, and overall the worst possible data we've come across. That didn't stop us! AI to the rescue!
97
  ![Low Quality Training Data](Huggingface-assets/multiple-samples-training-data.png)
98
 
@@ -114,191 +112,95 @@ A vintage comic book cover of Wonderman. On the cover, there are three main char
114
  ![Vintage Wonderman](Cleaned%20and%20Captioned%20Data/00008.png)
115
 
116
  Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Wonderman has a muscular physique, brown hair, and is wearing a black mask covering his eyes. He stands confidently with his hands by his sides. photo
117
- ![Standing Wonderman](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Cleaned%20and%20Captioned%20Data/00002.png)
118
 
119
  ### Train the Data
120
  All tasks were performed on a local workstation equipped with an RTX 4090, i7 processor, and 64GB RAM. Note that 32GB RAM will not suffice, as you may encounter out-of-memory (OOM) errors when caching latents. We did use RunDiffusion.com for testing the LoRAs created, enabling us to launch five servers with five checkpoints to determine the best one that converged
121
  We're not going to dive into the rank and learning rate and stuff because this really depends on your goals and what you're trying to accomplish. But the rules below are good ones to follow.
122
- - We used Ostris's ai-toolkit available here: https://github.com/ostris/ai-toolkit/tree/main
123
- - Default config with LR: 4e-4 at Rank 16
124
  - 2200 - 2600 steps saw good convergence. Even some checkpoints into the 4k step range turned out pretty good.
125
  If targeting finer details, you may want to adjust the rank up to 32 and lower the learning rate. You will also need to run more steps if you do this.
126
  **Training a style:** Using simple captions with clear examples to maintain a coherent style is crucial. Although caption-less LoRAs can sometimes work for styles, this was not within the scope of our goals, so we cannot provide specific insights.
127
  **Training a concept:** You can choose either descriptive captions to avoid interfering with existing tokens or general captions that might interfere, depending on your intention. This choice should be intentional.
128
 
129
  Captioning has never been more critical. Flux "gives you what you ask for" - and that's a good thing. You can train a LoRA on a single cartoon concept and still generate photo realistic people. You can even caption a cartoon in the foreground and a realistic scene in the background! This capability is BY DESIGN - so do not resist it - embrace it! (Spoiler alert next!)
130
- ![prompt different backgrounds]()
131
  You'll see in the next page of examples where the captioning really helps or hurts you. Depending on your goals again you will need to choose the path that fits what you're trying to accomplish.
132
  Total time for the LoRA was about 2 to 2.5 hours. $1 to $2 on RunPod, Vast, or local electricity will be even cheaper.
133
  Now for the results! (This next file is big to preserve the quality)
134
 
135
- ## 500 Steps
 
136
  ![500 steps](Huggingface-assets/500-steps.jpg)
137
-
138
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
139
-
140
- ### Direct Use
141
-
142
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
143
-
144
- [More Information Needed]
145
-
146
- ### Downstream Use [optional]
147
-
148
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
149
-
150
- [More Information Needed]
151
-
152
- ### Out-of-Scope Use
153
-
154
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
155
-
156
- [More Information Needed]
157
-
158
- ## Bias, Risks, and Limitations
159
-
160
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
161
-
162
- [More Information Needed]
163
-
164
- ### Recommendations
165
-
166
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
167
-
168
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
169
-
170
- ## How to Get Started with the Model
171
-
172
- Use the code below to get started with the model.
173
-
174
- [More Information Needed]
175
-
176
- ## Training Details
177
-
178
- ### Training Data
179
-
180
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
181
-
182
- [More Information Needed]
183
-
184
- ### Training Procedure
185
-
186
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
187
-
188
- #### Preprocessing [optional]
189
-
190
- [More Information Needed]
191
-
192
-
193
- #### Training Hyperparameters
194
-
195
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
196
-
197
- #### Speeds, Sizes, Times [optional]
198
-
199
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
200
-
201
- [More Information Needed]
202
-
203
- ## Evaluation
204
-
205
- <!-- This section describes the evaluation protocols and provides the results. -->
206
-
207
- ### Testing Data, Factors & Metrics
208
-
209
- #### Testing Data
210
-
211
- <!-- This should link to a Dataset Card if possible. -->
212
-
213
- [More Information Needed]
214
-
215
- #### Factors
216
-
217
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
218
-
219
- [More Information Needed]
220
-
221
- #### Metrics
222
-
223
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
224
-
225
- [More Information Needed]
226
-
227
- ### Results
228
-
229
- [More Information Needed]
230
-
231
- #### Summary
232
-
233
-
234
-
235
- ## Model Examination [optional]
236
-
237
- <!-- Relevant interpretability work for the model goes here -->
238
-
239
- [More Information Needed]
240
-
241
- ## Environmental Impact
242
-
243
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
244
-
245
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
246
-
247
- - **Hardware Type:** [More Information Needed]
248
- - **Hours used:** [More Information Needed]
249
- - **Cloud Provider:** [More Information Needed]
250
- - **Compute Region:** [More Information Needed]
251
- - **Carbon Emitted:** [More Information Needed]
252
-
253
- ## Technical Specifications [optional]
254
-
255
- ### Model Architecture and Objective
256
-
257
- [More Information Needed]
258
-
259
- ### Compute Infrastructure
260
-
261
- [More Information Needed]
262
-
263
- #### Hardware
264
-
265
- [More Information Needed]
266
-
267
- #### Software
268
-
269
- [More Information Needed]
270
-
271
- ## Citation [optional]
272
-
273
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
274
-
275
- **BibTeX:**
276
-
277
- [More Information Needed]
278
-
279
- **APA:**
280
-
281
- [More Information Needed]
282
-
283
- ## Glossary [optional]
284
-
285
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
286
-
287
- [More Information Needed]
288
-
289
- ## More Information [optional]
290
-
291
- [More Information Needed]
292
-
293
- ## Model Card Authors [optional]
294
-
295
- [More Information Needed]
296
-
297
- ## Model Card Contact
298
-
299
- [More Information Needed]
300
-
301
-
302
  - **Developed by:** Darin Holbrook - RunDiffusion co-founder and Chief Technology Officer
303
  - **Funded by:** RunDiffusion.com / RunPod.io
304
  - **Model type:** Flux [dev] LoRA
 
63
  license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
64
  ---
65
 
 
 
66
  <div style="display: flex; align-items: center; justify-content: space-between;"> <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/RD-Logo-dark.jpg" alt="Left Image" style="width: 30%;">
67
  <p style="text-align: center; width: 40%;">
68
  <span style="font-weight: bold; font-size: 1.5em;">Flux Training Concept - Wonderman POC</span><br>
 
90
 
91
 
92
  ## Data Used for Training
93
+ You can view the [RAW low quality data here: ](https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/tree/main/Raw%20Low%20Quality%20Data)
94
  The training data was low resolution, cropped, oddly shaped, pixelated, and overall the worst possible data we've come across. That didn't stop us! AI to the rescue!
95
  ![Low Quality Training Data](Huggingface-assets/multiple-samples-training-data.png)
96
 
 
112
  ![Vintage Wonderman](Cleaned%20and%20Captioned%20Data/00008.png)
113
 
114
  Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Wonderman has a muscular physique, brown hair, and is wearing a black mask covering his eyes. He stands confidently with his hands by his sides. photo
115
+ ![Standing Wonderman](Cleaned%20and%20Captioned%20Data/00002.png)
116
 
117
  ### Train the Data
118
  All tasks were performed on a local workstation equipped with an RTX 4090, i7 processor, and 64GB RAM. Note that 32GB RAM will not suffice, as you may encounter out-of-memory (OOM) errors when caching latents. We did use RunDiffusion.com for testing the LoRAs created, enabling us to launch five servers with five checkpoints to determine the best one that converged
119
  We're not going to dive into the rank and learning rate and stuff because this really depends on your goals and what you're trying to accomplish. But the rules below are good ones to follow.
120
+ - We used Ostris's ai-toolkit available here: [Ostris ai-toolkit](https://github.com/ostris/ai-toolkit/tree/main)
121
+ - Default config with LR 4e-4 at Rank 16
122
  - 2200 - 2600 steps saw good convergence. Even some checkpoints into the 4k step range turned out pretty good.
123
  If targeting finer details, you may want to adjust the rank up to 32 and lower the learning rate. You will also need to run more steps if you do this.
124
  **Training a style:** Using simple captions with clear examples to maintain a coherent style is crucial. Although caption-less LoRAs can sometimes work for styles, this was not within the scope of our goals, so we cannot provide specific insights.
125
  **Training a concept:** You can choose either descriptive captions to avoid interfering with existing tokens or general captions that might interfere, depending on your intention. This choice should be intentional.
126
 
127
  Captioning has never been more critical. Flux "gives you what you ask for" - and that's a good thing. You can train a LoRA on a single cartoon concept and still generate photo realistic people. You can even caption a cartoon in the foreground and a realistic scene in the background! This capability is BY DESIGN - so do not resist it - embrace it! (Spoiler alert next!)
128
+ ![prompt different backgrounds](Huggingface-assets/cartoon-foreground-realistic-background.jpg)
129
  You'll see in the next page of examples where the captioning really helps or hurts you. Depending on your goals again you will need to choose the path that fits what you're trying to accomplish.
130
  Total time for the LoRA was about 2 to 2.5 hours. $1 to $2 on RunPod, Vast, or local electricity will be even cheaper.
131
  Now for the results! (This next file is big to preserve the quality)
132
 
133
+ # 500 Steps
134
+ Right off the bat at 500 steps you will get some likeness. This will mostly be baseline Flux. If you're training a concept that exists then you will see some convergence even at just 500 steps.
135
  ![500 steps](Huggingface-assets/500-steps.jpg)
136
+ Prompt: a vintage comic book cover for Wonderman, featuring three characters in a dynamic action scene. The central figure is Wonderman with a confident expression, wearing a green shirt with a yellow belt and red gloves. To his left is a woman with a look of concern, dressed in a yellow top and red skirt. On the right, there's a monstrous creature with sharp teeth and claws, seemingly attacking the man. The background is minimal, primarily blue with a hint of landscape at the bottom. The text WONDER COMICS and No. 11 suggests this is from a series.
137
+
138
+ # 1250 Steps
139
+ It will start to break apart a little bit here. Be patient. It's learning.
140
+ ![1250 steps](Huggingface-assets/1250-steps.jpg)
141
+ Prompt: A vintage comic book cover titled 'Wonderman Comics'. The central figure is Wonderman who appears to be in a combat stance. He is lunging at a large, menacing creature with a gaping mouth, revealing sharp teeth. Below the main characters, there's a woman in a yellow dress holding a small device, possibly a gun. She seems to be in distress. In the background, there's a futuristic-looking tower with a few figures standing atop. The overall color palette is vibrant, with dominant yellows, greens, and purples.
142
+
143
+ # 1750 Steps
144
+ Hey! We're getting somewhere! The caption as a prompt should be showing our subject well at this stage but the real test is breaking away from the caption to see if our subject is present.
145
+ ![1750 steps](Huggingface-assets/1750-steps.jpg)
146
+ Prompt: Wonderman wearing a green and red costume with a large 'W' emblem on the chest standing heroically
147
+
148
+ # 2500 Steps
149
+ There he is! We can now prompt more freely to get Wonderman doing other stuff. Keep in mind we will still be limited to what we trained on, but at least we have a great starting point!
150
+ ![2500 steps](Huggingface-assets/2500-steps.jpg)
151
+ Prompt: comic style illustration of Wonderman running from aliens on the moon. center character is Wonderman, a male superhero character. He is wearing a green and red costume with a large 'W' emblem on the chest. Black boots to his knees. Wonderman is wearing a black mask covering his eyes
152
+
153
+ <div style="display: grid; grid-template-columns: repeat(auto-fill, minmax(300px, 1fr)); gap: 10px;">
154
+ <div>
155
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/Cosplay-realistic.jpg" alt="Image 1" style="width: 100%;">
156
+ <p style="text-align: center;">"Cosplay" always get's super heroes to appear realistic</p>
157
+ </div>
158
+ <div>
159
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/anime-sample.jpg" alt="Image 2" style="width: 100%;">
160
+ <p style="text-align: center;">Anime Style</p>
161
+ </div>
162
+ <div>
163
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/cosplay-realistic-2.jpg" alt="Image 3" style="width: 100%;">
164
+ <p style="text-align: center;">Photo Realistic</p>
165
+ </div>
166
+ <div>
167
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/comic-book-style-moon.jpg" alt="Image 3" style="width: 100%;">
168
+ <p style="text-align: center;">Illustration Style</p>
169
+ </div>
170
+ </div>
171
+
172
+ # Conclusion
173
+ This proof of concept provided valuable insights into working with Flux. One of the key lessons we took away is that while Flux is straightforward to train, it's crucial to clearly define your objectives before diving in. Without a clear vision, you might find your model either overwhelming or underwhelming—especially with concepts like "a cookie," which already has extensive representation within Flux.
174
+ Every training project comes with its own distinct set of challenges and complexities. Patience and persistence are essential to navigate these obstacles successfully. With careful planning and a focused approach, you'll be able to overcome these hurdles and achieve your desired outcomes.
175
+
176
+ ## Things We Would Do Different
177
+ Upon reviewing our example data, we identified several areas that could benefit from additional cleanup. These issues impacted the final model, leading to some unexpected results. For instance, when "Wonderman" was prompted, the model occasionally generated elements of "Superman" due to similarities between the two. This led to the appearance of a "cape" in an lot of generations. Another issue we found was the appearance of multicolored tights, with some samples showing red while others displayed green. Additionally, the model produced purple shorts again, which was a direct result of the training data.
178
+ While these challenges surfaced during the process, we believe they can be resolved with further refinement and adjustment of the dataset. Addressing these inconsistencies will improve the accuracy and quality of the model output.
179
+
180
+ <div style="display: grid; grid-template-columns: repeat(auto-fill, minmax(300px, 1fr)); gap: 10px;">
181
+ <div>
182
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/wonderman-superman-mix.jpg" alt="Image 1" style="width: 100%;">
183
+ <p style="text-align: center;">"Wonderman running from an alien on the moon with the earth in the sky in the background" and sometimes we get "Superman"</p>
184
+ </div>
185
+ <div>
186
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/wonderman-cape.jpg" alt="Image 2" style="width: 100%;">
187
+ <p style="text-align: center;">"Cape" is present even though it was not prompted</p>
188
+ </div>
189
+ <div>
190
+ <img src="https://huggingface.co/RunDiffusion/Wonderman-Flux-POC/resolve/main/Huggingface-assets/wonderman-two-colors-tights.jpg" alt="Image 3" style="width: 100%;">
191
+ <p style="text-align: center;">Red and green tights were both present due to the training data which had samples of both.</p>
192
+ </div>
193
+ </div>
194
+
195
+ [View all the generations here](Wonderman%20Generation%20Samples)
196
+
197
+ # Special Thanks
198
+ The team had a blast with this project and we can't wait to start the next one. The Wonderman LoRA will be available for download on Huggingface and Civitai for research. At RunDiffusion we are always trying to push the boundaries with whatever bleeding-edge tech comes out. We appreciate all our customers and supporters. Without you we would not have the funds to dedicate a team to research, so thank you!
199
+ - **Ostris**: [Thank you to Ostris](https://x.com/ostrisai) for the awesome training tools!
200
+ - **Mint**: [Thank you to Mint](https://x.com/araminta_k) for the awesome YouTube tutorial!
201
+ - **RunPod**: For the compute credit
202
+
203
+ # More Credits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
204
  - **Developed by:** Darin Holbrook - RunDiffusion co-founder and Chief Technology Officer
205
  - **Funded by:** RunDiffusion.com / RunPod.io
206
  - **Model type:** Flux [dev] LoRA