wangkanai commited on
Commit
fa4393e
Β·
verified Β·
1 Parent(s): e8413da

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +131 -183
README.md CHANGED
@@ -9,92 +9,82 @@ tags:
9
  - fp16
10
  ---
11
 
12
- <!-- README Version: v1.2 -->
13
 
14
- # FLUX.1-dev FP16 Model Repository
15
 
16
- High-quality text-to-image generation model from Black Forest Labs in FP16 precision format. FLUX.1-dev delivers state-of-the-art image synthesis with exceptional prompt adherence, visual quality, and detail preservation.
17
 
18
  ## Model Description
19
 
20
- FLUX.1-dev is a 12 billion parameter rectified flow transformer capable of generating high-resolution images from text descriptions. This FP16 precision version maintains maximum quality with no quantization loss, ideal for professional workflows requiring the highest fidelity output.
21
 
22
  **Key Capabilities**:
23
- - Advanced text-to-image generation with complex prompt understanding
24
- - High-resolution output (up to 2048x2048 and beyond)
25
- - Excellent composition, lighting, and detail rendering
26
- - Strong prompt adherence and instruction following
27
- - Superior handling of text rendering within images
28
- - Support for various artistic styles and photorealistic generation
29
 
30
  ## Repository Contents
31
 
32
- This repository contains the complete FLUX.1-dev FP16 model organized by component type:
33
-
34
  ```
35
  flux-dev-fp16/
36
  β”œβ”€β”€ checkpoints/flux/
37
- β”‚ └── flux1-dev-fp16.safetensors (23 GB) # Complete model checkpoint
38
- β”œβ”€β”€ diffusion_models/flux/
39
- β”‚ └── flux1-dev-fp16.safetensors (23 GB) # Diffusion model weights
40
- β”œβ”€β”€ text_encoders/
41
- β”‚ β”œβ”€β”€ clip_l.safetensors (235 MB) # CLIP-L text encoder
42
- β”‚ β”œβ”€β”€ clip_g.safetensors (1.3 GB) # CLIP-G text encoder
43
- β”‚ β”œβ”€β”€ clip-vit-large.safetensors (1.6 GB) # CLIP ViT-Large encoder
44
- β”‚ └── t5xxl_fp16.safetensors (9.2 GB) # T5-XXL text encoder
45
  β”œβ”€β”€ clip/
46
- β”‚ └── t5xxl_fp16.safetensors (9.2 GB) # T5-XXL encoder (alternate location)
47
  β”œβ”€β”€ clip_vision/
48
- β”‚ └── clip_vision_h.safetensors (1.2 GB) # CLIP vision encoder
 
 
 
 
 
 
 
49
  └── vae/flux/
50
- └── flux-vae-bf16.safetensors (160 MB) # VAE decoder in BF16 precision
51
 
52
- Total Repository Size: ~72 GB
53
  ```
54
 
55
- **Model Components**:
56
- - **Main Model**: `flux1-dev-fp16.safetensors` (23 GB) - Core diffusion transformer
57
- - **Text Encoders**: CLIP-L, CLIP-G, T5-XXL for advanced text understanding
58
- - **Vision Encoder**: CLIP vision model for image understanding capabilities
59
- - **VAE**: `flux-vae-bf16.safetensors` (160 MB) - Variational autoencoder for latent/image conversion
60
-
61
  ## Hardware Requirements
62
 
63
- **Minimum Requirements** (for basic inference):
64
- - **GPU**: NVIDIA RTX 4090 (24 GB VRAM) or equivalent
65
  - **RAM**: 32 GB system memory
66
- - **Storage**: 80 GB free disk space
67
- - **OS**: Windows 10/11, Linux (Ubuntu 20.04+)
68
 
69
- **Recommended Requirements** (for optimal performance):
70
- - **GPU**: NVIDIA A100 (40/80 GB VRAM) or RTX 6000 Ada
71
  - **RAM**: 64 GB system memory
72
- - **Storage**: NVMe SSD with 100+ GB free space
73
- - **OS**: Linux with CUDA 12.1+
74
 
75
- **Performance Notes**:
76
- - FP16 precision requires substantial VRAM (20+ GB for standard workflows)
77
- - Batch generation and high resolutions require additional memory
78
- - Consider FP8 or quantized versions for lower VRAM requirements
79
- - Generation time: ~10-30 seconds per image depending on hardware and resolution
80
 
81
  ## Usage Examples
82
 
83
- ### Basic Text-to-Image Generation (Diffusers)
84
 
85
  ```python
86
  import torch
87
  from diffusers import FluxPipeline
88
 
89
- # Load the FLUX.1-dev model
90
- pipe = FluxPipeline.from_single_file(
91
- "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
92
  torch_dtype=torch.float16
93
  )
94
- pipe.to("cuda")
95
 
96
  # Generate an image
97
- prompt = "A serene mountain landscape at sunset, with dramatic clouds and golden light"
98
  image = pipe(
99
  prompt=prompt,
100
  num_inference_steps=50,
@@ -106,179 +96,137 @@ image = pipe(
106
  image.save("output.png")
107
  ```
108
 
109
- ### Advanced Generation with Text Encoders
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  ```python
112
- import torch
113
- from diffusers import FluxPipeline
114
- from transformers import CLIPTextModel, T5EncoderModel
115
 
116
- # Load text encoders separately for fine control
117
- text_encoder = CLIPTextModel.from_pretrained(
118
  "E:/huggingface/flux-dev-fp16/text_encoders",
119
- torch_dtype=torch.float16
 
120
  )
121
 
122
- text_encoder_2 = T5EncoderModel.from_pretrained(
123
  "E:/huggingface/flux-dev-fp16/text_encoders",
124
- subfolder="t5xxl_fp16",
125
- torch_dtype=torch.float16
126
  )
127
 
128
- # Load FLUX pipeline with custom encoders
129
- pipe = FluxPipeline.from_single_file(
130
- "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
131
- text_encoder=text_encoder,
132
- text_encoder_2=text_encoder_2,
133
- torch_dtype=torch.float16
134
  )
135
- pipe.to("cuda")
136
-
137
- # Generate with advanced parameters
138
- image = pipe(
139
- prompt="A highly detailed cyberpunk street scene with neon signs and rain",
140
- negative_prompt="blurry, low quality, distorted",
141
- num_inference_steps=75,
142
- guidance_scale=8.0,
143
- height=1536,
144
- width=1024
145
- ).images[0]
146
-
147
- image.save("cyberpunk_output.png")
148
  ```
149
 
150
- ### Memory-Efficient Generation
151
 
152
- ```python
153
- import torch
154
- from diffusers import FluxPipeline
 
 
 
 
 
155
 
156
- # Enable memory optimizations
157
- pipe = FluxPipeline.from_single_file(
158
- "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
159
- torch_dtype=torch.float16
160
- )
161
 
162
- # Enable CPU offloading for lower VRAM usage
163
- pipe.enable_model_cpu_offload()
164
 
165
- # Enable attention slicing
166
- pipe.enable_attention_slicing(1)
 
 
167
 
168
- # Enable VAE slicing for high-resolution outputs
169
- pipe.enable_vae_slicing()
170
 
171
- # Generate image with optimizations
172
- image = pipe(
173
- prompt="An artistic portrait with intricate details",
174
- num_inference_steps=50,
175
- height=1024,
176
- width=1024
177
- ).images[0]
178
 
179
- image.save("optimized_output.png")
 
180
  ```
181
 
182
- ## Model Specifications
 
 
 
 
 
 
 
183
 
184
- | Specification | Details |
185
- |--------------|---------|
186
- | **Architecture** | Rectified Flow Transformer |
187
- | **Parameters** | 12 billion |
188
- | **Precision** | FP16 (16-bit floating point) |
189
- | **Format** | SafeTensors |
190
- | **Base Resolution** | 1024x1024 (supports flexible resolutions) |
191
- | **Max Resolution** | 2048x2048+ (hardware dependent) |
192
- | **Text Encoders** | CLIP-L, CLIP-G, T5-XXL |
193
- | **Inference Steps** | 20-100 (50 recommended) |
194
- | **Guidance Scale** | 7.0-9.0 (7.5 recommended) |
195
-
196
- **Supported Features**:
197
- - Text-to-image generation
198
- - Complex prompt understanding
199
- - Multi-aspect ratio generation
200
- - Img2img workflows
201
- - Inpainting and outpainting
202
- - ControlNet compatibility
203
- - LoRA fine-tuning support
204
-
205
- ## Performance Tips & Optimization
206
-
207
- **Speed Optimization**:
208
- - Use 20-30 inference steps for faster generation (slight quality trade-off)
209
- - Enable `xformers` or `torch.compile()` for attention optimization
210
- - Reduce guidance scale to 6.0-7.0 for faster convergence
211
- - Use lower resolutions (512x512, 768x768) for draft iterations
212
-
213
- **Memory Optimization**:
214
- - Enable CPU offloading: `pipe.enable_model_cpu_offload()`
215
- - Enable attention slicing: `pipe.enable_attention_slicing()`
216
- - Enable VAE slicing: `pipe.enable_vae_slicing()`
217
- - Use sequential CPU offload for extreme memory constraints
218
- - Consider switching to FP8 version for 50% memory reduction
219
-
220
- **Quality Optimization**:
221
- - Use 50-75 inference steps for maximum quality
222
- - Guidance scale 7.5-8.5 for strong prompt adherence
223
- - Add negative prompts to avoid common artifacts
224
- - Use higher resolutions (1536x1024, 2048x2048) for detail
225
- - Experiment with different samplers (DPM++, Euler a)
226
-
227
- **Workflow Optimization**:
228
- - Pre-load models at startup to avoid repeated loading
229
- - Batch generate similar prompts for efficiency
230
- - Cache text encoder outputs for prompt variations
231
- - Use FP16 mixed precision training for fine-tuning
232
 
233
  ## License
234
 
235
- FLUX.1-dev is licensed under the **Apache License 2.0**.
236
 
237
  **Usage Terms**:
238
- - Free for personal, research, and commercial use
239
- - Attribution to Black Forest Labs appreciated
240
- - No warranty provided, use at your own risk
241
- - See official license documentation for full terms
242
 
243
- **Ethical Use Guidelines**:
244
- - Do not generate harmful, illegal, or unethical content
245
- - Respect copyright and intellectual property
246
- - Follow platform-specific content policies
247
- - Consider social impact of generated media
248
 
249
  ## Citation
250
 
251
- If you use FLUX.1-dev in your research or projects, please cite:
252
 
253
  ```bibtex
254
- @software{flux1_dev_2024,
255
- title = {FLUX.1-dev: High-Quality Text-to-Image Generation},
256
- author = {Black Forest Labs},
257
- year = {2024},
258
- url = {https://huggingface.co/black-forest-labs/FLUX.1-dev},
259
- note = {FP16 precision version}
260
  }
261
  ```
262
 
263
- ## Links & Resources
264
 
265
- **Official Resources**:
266
- - Original Model: [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
267
- - Black Forest Labs: [https://blackforestlabs.ai](https://blackforestlabs.ai)
268
- - Documentation: [FLUX.1 Technical Documentation](https://blackforestlabs.ai/docs)
269
 
270
- **Community & Support**:
271
- - Hugging Face Diffusers: [https://github.com/huggingface/diffusers](https://github.com/huggingface/diffusers)
272
- - Community Forum: [Hugging Face Forums](https://discuss.huggingface.co/)
273
- - ComfyUI Integration: [ComfyUI FLUX Nodes](https://github.com/comfyanonymous/ComfyUI)
274
 
275
- **Related Models**:
276
- - FLUX.1-schnell (Fast version): [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)
277
- - FLUX.1-dev FP8 (Memory efficient): Available in sibling repository
 
278
 
279
  ---
280
 
281
- **Model Version**: FLUX.1-dev
282
- **Precision**: FP16
283
- **Repository Version**: v1.2
284
- **Last Updated**: 2025-10-14
 
9
  - fp16
10
  ---
11
 
12
+ <!-- README Version: v1.4 -->
13
 
14
+ # FLUX.1-dev FP16
15
 
16
+ High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.
17
 
18
  ## Model Description
19
 
20
+ FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.
21
 
22
  **Key Capabilities**:
23
+ - High-resolution text-to-image generation
24
+ - Advanced prompt understanding with T5-XXL text encoder
25
+ - Superior detail and coherence in generated images
26
+ - Wide range of artistic styles and subjects
27
+ - Multi-text encoder architecture (CLIP + T5)
 
28
 
29
  ## Repository Contents
30
 
 
 
31
  ```
32
  flux-dev-fp16/
33
  β”œβ”€β”€ checkpoints/flux/
34
+ β”‚ └── flux1-dev-fp16.safetensors # 23 GB - Complete model checkpoint
 
 
 
 
 
 
 
35
  β”œβ”€β”€ clip/
36
+ β”‚ └── t5xxl_fp16.safetensors # 9.2 GB - T5-XXL text encoder
37
  β”œβ”€β”€ clip_vision/
38
+ β”‚ └── clip_vision_h.safetensors # CLIP vision encoder
39
+ β”œβ”€β”€ diffusion_models/flux/
40
+ β”‚ └── flux1-dev-fp16.safetensors # 23 GB - Diffusion model
41
+ β”œβ”€β”€ text_encoders/
42
+ β”‚ β”œβ”€β”€ clip-vit-large.safetensors # 1.6 GB - CLIP ViT-Large encoder
43
+ β”‚ β”œβ”€β”€ clip_g.safetensors # 1.3 GB - CLIP-G encoder
44
+ β”‚ β”œβ”€β”€ clip_l.safetensors # 235 MB - CLIP-L encoder
45
+ β”‚ └── t5xxl_fp16.safetensors # 9.2 GB - T5-XXL encoder
46
  └── vae/flux/
47
+ └── flux-vae-bf16.safetensors # 160 MB - VAE decoder (BF16)
48
 
49
+ Total Size: ~72 GB
50
  ```
51
 
 
 
 
 
 
 
52
  ## Hardware Requirements
53
 
54
+ ### Minimum Requirements
55
+ - **VRAM**: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
56
  - **RAM**: 32 GB system memory
57
+ - **Disk Space**: 80 GB free space
58
+ - **GPU**: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
59
 
60
+ ### Recommended Requirements
61
+ - **VRAM**: 32+ GB (RTX 6000 Ada, A6000, H100)
62
  - **RAM**: 64 GB system memory
63
+ - **Disk Space**: 100+ GB for workspace and outputs
64
+ - **GPU**: NVIDIA RTX 4090 or professional GPUs
65
 
66
+ ### Performance Notes
67
+ - FP16 precision provides best quality but highest VRAM usage
68
+ - Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory)
69
+ - Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)
 
70
 
71
  ## Usage Examples
72
 
73
+ ### Using with Diffusers Library
74
 
75
  ```python
76
  import torch
77
  from diffusers import FluxPipeline
78
 
79
+ # Load the pipeline with local model files
80
+ pipe = FluxPipeline.from_pretrained(
81
+ "E:/huggingface/flux-dev-fp16",
82
  torch_dtype=torch.float16
83
  )
84
+ pipe = pipe.to("cuda")
85
 
86
  # Generate an image
87
+ prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
88
  image = pipe(
89
  prompt=prompt,
90
  num_inference_steps=50,
 
96
  image.save("output.png")
97
  ```
98
 
99
+ ### Using with ComfyUI
100
+
101
+ 1. Copy model files to ComfyUI directories:
102
+ - `checkpoints/flux/flux1-dev-fp16.safetensors` β†’ `ComfyUI/models/checkpoints/`
103
+ - `text_encoders/*.safetensors` β†’ `ComfyUI/models/clip/`
104
+ - `vae/flux/flux-vae-bf16.safetensors` β†’ `ComfyUI/models/vae/`
105
+
106
+ 2. In ComfyUI:
107
+ - Load Checkpoint: Select `flux1-dev-fp16`
108
+ - Text Encoder: Automatically loaded
109
+ - VAE: Select `flux-vae-bf16`
110
+
111
+ ### Using Individual Components
112
 
113
  ```python
114
+ from diffusers import AutoencoderKL
115
+ from transformers import T5EncoderModel, CLIPTextModel
 
116
 
117
+ # Load text encoders
118
+ t5_encoder = T5EncoderModel.from_pretrained(
119
  "E:/huggingface/flux-dev-fp16/text_encoders",
120
+ torch_dtype=torch.float16,
121
+ filename="t5xxl_fp16.safetensors"
122
  )
123
 
124
+ clip_encoder = CLIPTextModel.from_pretrained(
125
  "E:/huggingface/flux-dev-fp16/text_encoders",
126
+ torch_dtype=torch.float16,
127
+ filename="clip_l.safetensors"
128
  )
129
 
130
+ # Load VAE
131
+ vae = AutoencoderKL.from_pretrained(
132
+ "E:/huggingface/flux-dev-fp16/vae/flux",
133
+ torch_dtype=torch.bfloat16,
134
+ filename="flux-vae-bf16.safetensors"
 
135
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  ```
137
 
138
+ ## Model Specifications
139
 
140
+ **Architecture**:
141
+ - **Type**: Latent Diffusion Transformer
142
+ - **Parameters**: ~12B (diffusion model)
143
+ - **Text Encoders**:
144
+ - T5-XXL: 4.7B parameters (FP16)
145
+ - CLIP-G: 1.3B parameters
146
+ - CLIP-L: 235M parameters
147
+ - **VAE**: BF16 precision (160M parameters)
148
 
149
+ **Precision**:
150
+ - **Diffusion Model**: FP16 (float16)
151
+ - **Text Encoders**: FP16 (float16)
152
+ - **VAE**: BF16 (bfloat16)
 
153
 
154
+ **Format**:
155
+ - `.safetensors` - Secure tensor format with fast loading
156
 
157
+ **Resolution Support**:
158
+ - Native: 1024x1024
159
+ - Range: 512x512 to 2048x2048
160
+ - Aspect ratios: Supports non-square resolutions
161
 
162
+ ## Performance Tips
 
163
 
164
+ ### Memory Optimization
165
+ ```python
166
+ # Enable memory efficient attention
167
+ pipe.enable_attention_slicing()
168
+
169
+ # Enable VAE tiling for high resolutions
170
+ pipe.enable_vae_tiling()
171
 
172
+ # Use CPU offloading if VRAM limited (slower)
173
+ pipe.enable_sequential_cpu_offload()
174
  ```
175
 
176
+ ### Speed Optimization
177
+ ```python
178
+ # Use torch.compile for faster inference (PyTorch 2.0+)
179
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
180
+
181
+ # Reduce inference steps (trade quality for speed)
182
+ image = pipe(prompt, num_inference_steps=25) # Default is 50
183
+ ```
184
 
185
+ ### Quality Optimization
186
+ - Use 50-75 inference steps for best quality
187
+ - Guidance scale: 7-9 for balanced results
188
+ - Higher guidance (10-15) for stronger prompt adherence
189
+ - Consider prompt engineering for better results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
  ## License
192
 
193
+ This model is released under the **Apache 2.0 License**.
194
 
195
  **Usage Terms**:
196
+ - βœ… Commercial use allowed
197
+ - βœ… Modification and redistribution allowed
198
+ - βœ… Patent use allowed
199
+ - ⚠️ Requires attribution to Black Forest Labs
200
 
201
+ See the LICENSE file for full terms.
 
 
 
 
202
 
203
  ## Citation
204
 
205
+ If you use this model in your research or projects, please cite:
206
 
207
  ```bibtex
208
+ @misc{flux-dev,
209
+ title={FLUX.1-dev: High-Quality Text-to-Image Generation},
210
+ author={Black Forest Labs},
211
+ year={2024},
212
+ howpublished={\url{https://blackforestlabs.ai/}}
 
213
  }
214
  ```
215
 
216
+ ## Related Resources
217
 
218
+ - **Official Website**: https://blackforestlabs.ai/
219
+ - **Model Card**: https://huggingface.co/black-forest-labs/FLUX.1-dev
220
+ - **Documentation**: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
221
+ - **Community**: https://huggingface.co/black-forest-labs
222
 
223
+ ## Version Information
 
 
 
224
 
225
+ - **Model Version**: FLUX.1-dev
226
+ - **Precision**: FP16
227
+ - **Release**: 2024
228
+ - **README Version**: v1.4
229
 
230
  ---
231
 
232
+ For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/`