wangkanai commited on
Commit
f061223
·
verified ·
1 Parent(s): c22a27f

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -1,3 +1,5 @@
 
 
1
  ---
2
  license: apache-2.0
3
  library_name: diffusers
@@ -7,98 +9,277 @@ tags:
7
  - flux
8
  - flux.1-dev
9
  - image-generation
10
- - stable-diffusion
11
  - fp16
12
- - full-precision
 
13
  base_model: black-forest-labs/FLUX.1-dev
14
  ---
15
 
16
- # FLUX.1-dev FP16 Model Collection
17
 
18
- This repository contains the FP16 (full precision) variant of the FLUX.1-dev text-to-image generation model. This streamlined collection includes only FP16 precision models optimized for quality.
19
 
20
  ## Model Description
21
 
22
- FLUX.1-dev is a state-of-the-art text-to-image generation model that produces high-quality images from text prompts. This FP16 collection provides the best quality output with full precision weights.
 
 
 
 
 
 
 
 
23
 
24
  ## Repository Contents
25
 
26
- **Total Size**: ~72GB
27
 
28
- ### Diffusion Models
29
- - `diffusion_models/flux1-dev-fp16.safetensors` (23GB) - Main diffusion model
30
- - `checkpoints/flux1-dev-fp16.safetensors` (23GB) - Checkpoint format
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- ### Text Encoders
33
- - `text_encoders/clip_g.safetensors` (1.3GB) - CLIP-G text encoder
34
- - `text_encoders/clip_l.safetensors` (235MB) - CLIP-L text encoder
35
- - `text_encoders/clip-vit-large.safetensors` (1.6GB) - CLIP ViT-Large encoder
36
- - `text_encoders/t5xxl_fp16.safetensors` (9.2GB) - T5-XXL FP16 text encoder
37
- - `clip/t5xxl_fp16.safetensors` (9.2GB) - T5-XXL alternative path
38
 
39
- ### Vision Models
40
- - `clip_vision/clip_vision_h.safetensors` (1.2GB) - CLIP Vision H model
 
 
41
 
42
  ## Hardware Requirements
43
 
44
- - **VRAM**: 16GB+ recommended for optimal performance
45
- - **Disk Space**: 72GB
46
- - **Precision**: FP16 (full precision)
47
- - **Memory**: 32GB+ system RAM recommended
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
- ## Usage
 
 
50
 
51
  ```python
52
- from diffusers import FluxPipeline
53
  import torch
 
54
 
55
- # Load the FP16 model
56
- pipe = FluxPipeline.from_pretrained(
57
- "path/to/flux-dev-fp16",
58
  torch_dtype=torch.float16
59
  )
60
-
61
  pipe.to("cuda")
62
 
63
  # Generate an image
 
64
  image = pipe(
65
- prompt="a beautiful mountain landscape at sunset",
66
  num_inference_steps=50,
67
- guidance_scale=7.5
 
 
68
  ).images[0]
69
 
70
  image.save("output.png")
71
  ```
72
 
73
- ## Model Precision Trade-offs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
- **FP16 (This Collection)**:
76
- - Best quality output
77
- - Full precision weights
78
- - Requires more VRAM (16GB+)
79
- - Slower inference compared to FP8
80
- - Recommended for: Quality-focused applications, professional use
81
 
82
- **Alternatives**:
83
- - FP8: ~50% smaller, faster inference, minimal quality loss
84
- - GGUF: Quantized variants for memory-constrained scenarios
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ## License
87
 
88
- This model is released under the Apache 2.0 license.
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ## Citation
91
 
 
 
92
  ```bibtex
93
- @software{flux1-dev,
 
94
  author = {Black Forest Labs},
95
- title = {FLUX.1-dev},
96
  year = {2024},
97
- publisher = {Hugging Face},
98
- url = {https://huggingface.co/black-forest-labs/FLUX.1-dev}
99
  }
100
  ```
101
 
102
- ## Model Card Contact
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
- For questions or issues with this model collection, please refer to the original FLUX.1-dev model card and repository.
 
 
 
 
1
+ <!-- README Version: v1.0 -->
2
+
3
  ---
4
  license: apache-2.0
5
  library_name: diffusers
 
9
  - flux
10
  - flux.1-dev
11
  - image-generation
 
12
  - fp16
13
+ - diffusion
14
+ - stable-diffusion
15
  base_model: black-forest-labs/FLUX.1-dev
16
  ---
17
 
18
+ # FLUX.1-dev FP16 Model Repository
19
 
20
+ High-quality text-to-image generation model from Black Forest Labs in FP16 precision format. FLUX.1-dev delivers state-of-the-art image synthesis with exceptional prompt adherence, visual quality, and detail preservation.
21
 
22
  ## Model Description
23
 
24
+ FLUX.1-dev is a 12 billion parameter rectified flow transformer capable of generating high-resolution images from text descriptions. This FP16 precision version maintains maximum quality with no quantization loss, ideal for professional workflows requiring the highest fidelity output.
25
+
26
+ **Key Capabilities**:
27
+ - Advanced text-to-image generation with complex prompt understanding
28
+ - High-resolution output (up to 2048x2048 and beyond)
29
+ - Excellent composition, lighting, and detail rendering
30
+ - Strong prompt adherence and instruction following
31
+ - Superior handling of text rendering within images
32
+ - Support for various artistic styles and photorealistic generation
33
 
34
  ## Repository Contents
35
 
36
+ This repository contains the complete FLUX.1-dev FP16 model organized by component type:
37
 
38
+ ```
39
+ flux-dev-fp16/
40
+ ├── checkpoints/flux/
41
+ │ └── flux1-dev-fp16.safetensors (23 GB) # Complete model checkpoint
42
+ ├── diffusion_models/flux/
43
+ │ └── flux1-dev-fp16.safetensors (23 GB) # Diffusion model weights
44
+ ├── text_encoders/
45
+ │ ├── clip_l.safetensors (235 MB) # CLIP-L text encoder
46
+ │ ├── clip_g.safetensors (1.3 GB) # CLIP-G text encoder
47
+ │ ├── clip-vit-large.safetensors (1.6 GB) # CLIP ViT-Large encoder
48
+ │ └── t5xxl_fp16.safetensors (9.2 GB) # T5-XXL text encoder
49
+ ├── clip/
50
+ │ └── t5xxl_fp16.safetensors (9.2 GB) # T5-XXL encoder (alternate location)
51
+ └── clip_vision/
52
+ └── clip_vision_h.safetensors (1.2 GB) # CLIP vision encoder
53
 
54
+ Total Repository Size: 72 GB
55
+ ```
 
 
 
 
56
 
57
+ **Model Components**:
58
+ - **Main Model**: `flux1-dev-fp16.safetensors` (23 GB) - Core diffusion transformer
59
+ - **Text Encoders**: CLIP-L, CLIP-G, T5-XXL for advanced text understanding
60
+ - **Vision Encoder**: CLIP vision model for image understanding capabilities
61
 
62
  ## Hardware Requirements
63
 
64
+ **Minimum Requirements** (for basic inference):
65
+ - **GPU**: NVIDIA RTX 4090 (24 GB VRAM) or equivalent
66
+ - **RAM**: 32 GB system memory
67
+ - **Storage**: 80 GB free disk space
68
+ - **OS**: Windows 10/11, Linux (Ubuntu 20.04+)
69
+
70
+ **Recommended Requirements** (for optimal performance):
71
+ - **GPU**: NVIDIA A100 (40/80 GB VRAM) or RTX 6000 Ada
72
+ - **RAM**: 64 GB system memory
73
+ - **Storage**: NVMe SSD with 100+ GB free space
74
+ - **OS**: Linux with CUDA 12.1+
75
+
76
+ **Performance Notes**:
77
+ - FP16 precision requires substantial VRAM (20+ GB for standard workflows)
78
+ - Batch generation and high resolutions require additional memory
79
+ - Consider FP8 or quantized versions for lower VRAM requirements
80
+ - Generation time: ~10-30 seconds per image depending on hardware and resolution
81
 
82
+ ## Usage Examples
83
+
84
+ ### Basic Text-to-Image Generation (Diffusers)
85
 
86
  ```python
 
87
  import torch
88
+ from diffusers import FluxPipeline
89
 
90
+ # Load the FLUX.1-dev model
91
+ pipe = FluxPipeline.from_single_file(
92
+ "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
93
  torch_dtype=torch.float16
94
  )
 
95
  pipe.to("cuda")
96
 
97
  # Generate an image
98
+ prompt = "A serene mountain landscape at sunset, with dramatic clouds and golden light"
99
  image = pipe(
100
+ prompt=prompt,
101
  num_inference_steps=50,
102
+ guidance_scale=7.5,
103
+ height=1024,
104
+ width=1024
105
  ).images[0]
106
 
107
  image.save("output.png")
108
  ```
109
 
110
+ ### Advanced Generation with Text Encoders
111
+
112
+ ```python
113
+ import torch
114
+ from diffusers import FluxPipeline
115
+ from transformers import CLIPTextModel, T5EncoderModel
116
+
117
+ # Load text encoders separately for fine control
118
+ text_encoder = CLIPTextModel.from_pretrained(
119
+ "E:/huggingface/flux-dev-fp16/text_encoders",
120
+ torch_dtype=torch.float16
121
+ )
122
+
123
+ text_encoder_2 = T5EncoderModel.from_pretrained(
124
+ "E:/huggingface/flux-dev-fp16/text_encoders",
125
+ subfolder="t5xxl_fp16",
126
+ torch_dtype=torch.float16
127
+ )
128
+
129
+ # Load FLUX pipeline with custom encoders
130
+ pipe = FluxPipeline.from_single_file(
131
+ "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
132
+ text_encoder=text_encoder,
133
+ text_encoder_2=text_encoder_2,
134
+ torch_dtype=torch.float16
135
+ )
136
+ pipe.to("cuda")
137
+
138
+ # Generate with advanced parameters
139
+ image = pipe(
140
+ prompt="A highly detailed cyberpunk street scene with neon signs and rain",
141
+ negative_prompt="blurry, low quality, distorted",
142
+ num_inference_steps=75,
143
+ guidance_scale=8.0,
144
+ height=1536,
145
+ width=1024
146
+ ).images[0]
147
+
148
+ image.save("cyberpunk_output.png")
149
+ ```
150
+
151
+ ### Memory-Efficient Generation
152
+
153
+ ```python
154
+ import torch
155
+ from diffusers import FluxPipeline
156
+
157
+ # Enable memory optimizations
158
+ pipe = FluxPipeline.from_single_file(
159
+ "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
160
+ torch_dtype=torch.float16
161
+ )
162
+
163
+ # Enable CPU offloading for lower VRAM usage
164
+ pipe.enable_model_cpu_offload()
165
+
166
+ # Enable attention slicing
167
+ pipe.enable_attention_slicing(1)
168
+
169
+ # Enable VAE slicing for high-resolution outputs
170
+ pipe.enable_vae_slicing()
171
+
172
+ # Generate image with optimizations
173
+ image = pipe(
174
+ prompt="An artistic portrait with intricate details",
175
+ num_inference_steps=50,
176
+ height=1024,
177
+ width=1024
178
+ ).images[0]
179
 
180
+ image.save("optimized_output.png")
181
+ ```
 
 
 
 
182
 
183
+ ## Model Specifications
184
+
185
+ | Specification | Details |
186
+ |--------------|---------|
187
+ | **Architecture** | Rectified Flow Transformer |
188
+ | **Parameters** | 12 billion |
189
+ | **Precision** | FP16 (16-bit floating point) |
190
+ | **Format** | SafeTensors |
191
+ | **Base Resolution** | 1024x1024 (supports flexible resolutions) |
192
+ | **Max Resolution** | 2048x2048+ (hardware dependent) |
193
+ | **Text Encoders** | CLIP-L, CLIP-G, T5-XXL |
194
+ | **Inference Steps** | 20-100 (50 recommended) |
195
+ | **Guidance Scale** | 7.0-9.0 (7.5 recommended) |
196
+
197
+ **Supported Features**:
198
+ - Text-to-image generation
199
+ - Complex prompt understanding
200
+ - Multi-aspect ratio generation
201
+ - Img2img workflows
202
+ - Inpainting and outpainting
203
+ - ControlNet compatibility
204
+ - LoRA fine-tuning support
205
+
206
+ ## Performance Tips & Optimization
207
+
208
+ **Speed Optimization**:
209
+ - Use 20-30 inference steps for faster generation (slight quality trade-off)
210
+ - Enable `xformers` or `torch.compile()` for attention optimization
211
+ - Reduce guidance scale to 6.0-7.0 for faster convergence
212
+ - Use lower resolutions (512x512, 768x768) for draft iterations
213
+
214
+ **Memory Optimization**:
215
+ - Enable CPU offloading: `pipe.enable_model_cpu_offload()`
216
+ - Enable attention slicing: `pipe.enable_attention_slicing()`
217
+ - Enable VAE slicing: `pipe.enable_vae_slicing()`
218
+ - Use sequential CPU offload for extreme memory constraints
219
+ - Consider switching to FP8 version for 50% memory reduction
220
+
221
+ **Quality Optimization**:
222
+ - Use 50-75 inference steps for maximum quality
223
+ - Guidance scale 7.5-8.5 for strong prompt adherence
224
+ - Add negative prompts to avoid common artifacts
225
+ - Use higher resolutions (1536x1024, 2048x2048) for detail
226
+ - Experiment with different samplers (DPM++, Euler a)
227
+
228
+ **Workflow Optimization**:
229
+ - Pre-load models at startup to avoid repeated loading
230
+ - Batch generate similar prompts for efficiency
231
+ - Cache text encoder outputs for prompt variations
232
+ - Use FP16 mixed precision training for fine-tuning
233
 
234
  ## License
235
 
236
+ FLUX.1-dev is licensed under the **Apache License 2.0**.
237
+
238
+ **Usage Terms**:
239
+ - Free for personal, research, and commercial use
240
+ - Attribution to Black Forest Labs appreciated
241
+ - No warranty provided, use at your own risk
242
+ - See official license documentation for full terms
243
+
244
+ **Ethical Use Guidelines**:
245
+ - Do not generate harmful, illegal, or unethical content
246
+ - Respect copyright and intellectual property
247
+ - Follow platform-specific content policies
248
+ - Consider social impact of generated media
249
 
250
  ## Citation
251
 
252
+ If you use FLUX.1-dev in your research or projects, please cite:
253
+
254
  ```bibtex
255
+ @software{flux1_dev_2024,
256
+ title = {FLUX.1-dev: High-Quality Text-to-Image Generation},
257
  author = {Black Forest Labs},
 
258
  year = {2024},
259
+ url = {https://huggingface.co/black-forest-labs/FLUX.1-dev},
260
+ note = {FP16 precision version}
261
  }
262
  ```
263
 
264
+ ## Links & Resources
265
+
266
+ **Official Resources**:
267
+ - Original Model: [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
268
+ - Black Forest Labs: [https://blackforestlabs.ai](https://blackforestlabs.ai)
269
+ - Documentation: [FLUX.1 Technical Documentation](https://blackforestlabs.ai/docs)
270
+
271
+ **Community & Support**:
272
+ - Hugging Face Diffusers: [https://github.com/huggingface/diffusers](https://github.com/huggingface/diffusers)
273
+ - Community Forum: [Hugging Face Forums](https://discuss.huggingface.co/)
274
+ - ComfyUI Integration: [ComfyUI FLUX Nodes](https://github.com/comfyanonymous/ComfyUI)
275
+
276
+ **Related Models**:
277
+ - FLUX.1-schnell (Fast version): [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)
278
+ - FLUX.1-dev FP8 (Memory efficient): Available in sibling repository
279
+
280
+ ---
281
 
282
+ **Model Version**: FLUX.1-dev
283
+ **Precision**: FP16
284
+ **Repository Version**: v1.0
285
+ **Last Updated**: 2025-10-13
checkpoints/flux/flux1-dev-fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4610115bb0c89560703c892c59ac2742fa821e60ef5871b33493ba544683abd7
3
+ size 23802932552
diffusion_models/flux/flux1-dev-fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4610115bb0c89560703c892c59ac2742fa821e60ef5871b33493ba544683abd7
3
+ size 23802932552