wangkanai
/

flux-dev-fp16

@@ -9,92 +9,82 @@ tags:
   - fp16
 ---
-<!-- README Version: v1.2 -->
-# FLUX.1-dev FP16 Model Repository
-High-quality text-to-image generation model from Black Forest Labs in FP16 precision format. FLUX.1-dev delivers state-of-the-art image synthesis with exceptional prompt adherence, visual quality, and detail preservation.
 ## Model Description
-FLUX.1-dev is a 12 billion parameter rectified flow transformer capable of generating high-resolution images from text descriptions. This FP16 precision version maintains maximum quality with no quantization loss, ideal for professional workflows requiring the highest fidelity output.
 **Key Capabilities**:
-- Advanced text-to-image generation with complex prompt understanding
-- High-resolution output (up to 2048x2048 and beyond)
-- Excellent composition, lighting, and detail rendering
-- Strong prompt adherence and instruction following
-- Superior handling of text rendering within images
-- Support for various artistic styles and photorealistic generation
 ## Repository Contents
-This repository contains the complete FLUX.1-dev FP16 model organized by component type:
 ```
 flux-dev-fp16/
 ├── checkpoints/flux/
-│   └── flux1-dev-fp16.safetensors          (23 GB)  # Complete model checkpoint
-├── diffusion_models/flux/
-│   └── flux1-dev-fp16.safetensors          (23 GB)  # Diffusion model weights
-├── text_encoders/
-│   ├── clip_l.safetensors                   (235 MB) # CLIP-L text encoder
-│   ├── clip_g.safetensors                   (1.3 GB) # CLIP-G text encoder
-│   ├── clip-vit-large.safetensors          (1.6 GB) # CLIP ViT-Large encoder
-│   └── t5xxl_fp16.safetensors              (9.2 GB) # T5-XXL text encoder
 ├── clip/
-│   └── t5xxl_fp16.safetensors              (9.2 GB) # T5-XXL encoder (alternate location)
 ├── clip_vision/
-│   └── clip_vision_h.safetensors           (1.2 GB) # CLIP vision encoder
 └── vae/flux/
-    └── flux-vae-bf16.safetensors            (160 MB) # VAE decoder in BF16 precision
-Total Repository Size: ~72 GB
 ```
-**Model Components**:
-- **Main Model**: `flux1-dev-fp16.safetensors` (23 GB) - Core diffusion transformer
-- **Text Encoders**: CLIP-L, CLIP-G, T5-XXL for advanced text understanding
-- **Vision Encoder**: CLIP vision model for image understanding capabilities
-- **VAE**: `flux-vae-bf16.safetensors` (160 MB) - Variational autoencoder for latent/image conversion
 ## Hardware Requirements
-**Minimum Requirements** (for basic inference):
-- **GPU**: NVIDIA RTX 4090 (24 GB VRAM) or equivalent
 - **RAM**: 32 GB system memory
-- **Storage**: 80 GB free disk space
-- **OS**: Windows 10/11, Linux (Ubuntu 20.04+)
-**Recommended Requirements** (for optimal performance):
-- **GPU**: NVIDIA A100 (40/80 GB VRAM) or RTX 6000 Ada
 - **RAM**: 64 GB system memory
-- **Storage**: NVMe SSD with 100+ GB free space
-- **OS**: Linux with CUDA 12.1+
-**Performance Notes**:
-- FP16 precision requires substantial VRAM (20+ GB for standard workflows)
-- Batch generation and high resolutions require additional memory
-- Consider FP8 or quantized versions for lower VRAM requirements
-- Generation time: ~10-30 seconds per image depending on hardware and resolution
 ## Usage Examples
-### Basic Text-to-Image Generation (Diffusers)
 ```python
 import torch
 from diffusers import FluxPipeline
-# Load the FLUX.1-dev model
-pipe = FluxPipeline.from_single_file(
-    "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
     torch_dtype=torch.float16
 )
-pipe.to("cuda")
 # Generate an image
-prompt = "A serene mountain landscape at sunset, with dramatic clouds and golden light"
 image = pipe(
     prompt=prompt,
     num_inference_steps=50,
@@ -106,179 +96,137 @@ image = pipe(
 image.save("output.png")
 ```
-### Advanced Generation with Text Encoders
 ```python
-import torch
-from diffusers import FluxPipeline
-from transformers import CLIPTextModel, T5EncoderModel
-# Load text encoders separately for fine control
-text_encoder = CLIPTextModel.from_pretrained(
     "E:/huggingface/flux-dev-fp16/text_encoders",
-    torch_dtype=torch.float16
 )
-text_encoder_2 = T5EncoderModel.from_pretrained(
     "E:/huggingface/flux-dev-fp16/text_encoders",
-    subfolder="t5xxl_fp16",
-    torch_dtype=torch.float16
 )
-# Load FLUX pipeline with custom encoders
-pipe = FluxPipeline.from_single_file(
-    "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
-    text_encoder=text_encoder,
-    text_encoder_2=text_encoder_2,
-    torch_dtype=torch.float16
 )
-pipe.to("cuda")
-# Generate with advanced parameters
-image = pipe(
-    prompt="A highly detailed cyberpunk street scene with neon signs and rain",
-    negative_prompt="blurry, low quality, distorted",
-    num_inference_steps=75,
-    guidance_scale=8.0,
-    height=1536,
-    width=1024
-).images[0]
-image.save("cyberpunk_output.png")
 ```
-### Memory-Efficient Generation
-```python
-import torch
-from diffusers import FluxPipeline
-# Enable memory optimizations
-pipe = FluxPipeline.from_single_file(
-    "E:/huggingface/flux-dev-fp16/checkpoints/flux/flux1-dev-fp16.safetensors",
-    torch_dtype=torch.float16
-)
-# Enable CPU offloading for lower VRAM usage
-pipe.enable_model_cpu_offload()
-# Enable attention slicing
-pipe.enable_attention_slicing(1)
-# Enable VAE slicing for high-resolution outputs
-pipe.enable_vae_slicing()
-# Generate image with optimizations
-image = pipe(
-    prompt="An artistic portrait with intricate details",
-    num_inference_steps=50,
-    height=1024,
-    width=1024
-).images[0]
-image.save("optimized_output.png")
 ```
-## Model Specifications
-| Specification | Details |
-|--------------|---------|
-| **Architecture** | Rectified Flow Transformer |
-| **Parameters** | 12 billion |
-| **Precision** | FP16 (16-bit floating point) |
-| **Format** | SafeTensors |
-| **Base Resolution** | 1024x1024 (supports flexible resolutions) |
-| **Max Resolution** | 2048x2048+ (hardware dependent) |
-| **Text Encoders** | CLIP-L, CLIP-G, T5-XXL |
-| **Inference Steps** | 20-100 (50 recommended) |
-| **Guidance Scale** | 7.0-9.0 (7.5 recommended) |
-**Supported Features**:
-- Text-to-image generation
-- Complex prompt understanding
-- Multi-aspect ratio generation
-- Img2img workflows
-- Inpainting and outpainting
-- ControlNet compatibility
-- LoRA fine-tuning support
-## Performance Tips & Optimization
-**Speed Optimization**:
-- Use 20-30 inference steps for faster generation (slight quality trade-off)
-- Enable `xformers` or `torch.compile()` for attention optimization
-- Reduce guidance scale to 6.0-7.0 for faster convergence
-- Use lower resolutions (512x512, 768x768) for draft iterations
-**Memory Optimization**:
-- Enable CPU offloading: `pipe.enable_model_cpu_offload()`
-- Enable attention slicing: `pipe.enable_attention_slicing()`
-- Enable VAE slicing: `pipe.enable_vae_slicing()`
-- Use sequential CPU offload for extreme memory constraints
-- Consider switching to FP8 version for 50% memory reduction
-**Quality Optimization**:
-- Use 50-75 inference steps for maximum quality
-- Guidance scale 7.5-8.5 for strong prompt adherence
-- Add negative prompts to avoid common artifacts
-- Use higher resolutions (1536x1024, 2048x2048) for detail
-- Experiment with different samplers (DPM++, Euler a)
-**Workflow Optimization**:
-- Pre-load models at startup to avoid repeated loading
-- Batch generate similar prompts for efficiency
-- Cache text encoder outputs for prompt variations
-- Use FP16 mixed precision training for fine-tuning
 ## License
-FLUX.1-dev is licensed under the **Apache License 2.0**.
 **Usage Terms**:
-- Free for personal, research, and commercial use
-- Attribution to Black Forest Labs appreciated
-- No warranty provided, use at your own risk
-- See official license documentation for full terms
-**Ethical Use Guidelines**:
-- Do not generate harmful, illegal, or unethical content
-- Respect copyright and intellectual property
-- Follow platform-specific content policies
-- Consider social impact of generated media
 ## Citation
-If you use FLUX.1-dev in your research or projects, please cite:
 ```bibtex
-@software{flux1_dev_2024,
-  title = {FLUX.1-dev: High-Quality Text-to-Image Generation},
-  author = {Black Forest Labs},
-  year = {2024},
-  url = {https://huggingface.co/black-forest-labs/FLUX.1-dev},
-  note = {FP16 precision version}
 }
 ```
-## Links & Resources
-**Official Resources**:
-- Original Model: [black-forest-labs/FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
-- Black Forest Labs: [https://blackforestlabs.ai](https://blackforestlabs.ai)
-- Documentation: [FLUX.1 Technical Documentation](https://blackforestlabs.ai/docs)
-**Community & Support**:
-- Hugging Face Diffusers: [https://github.com/huggingface/diffusers](https://github.com/huggingface/diffusers)
-- Community Forum: [Hugging Face Forums](https://discuss.huggingface.co/)
-- ComfyUI Integration: [ComfyUI FLUX Nodes](https://github.com/comfyanonymous/ComfyUI)
-**Related Models**:
-- FLUX.1-schnell (Fast version): [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell)
-- FLUX.1-dev FP8 (Memory efficient): Available in sibling repository
 ---
-**Model Version**: FLUX.1-dev
-**Precision**: FP16
-**Repository Version**: v1.2
-**Last Updated**: 2025-10-14

   - fp16
 ---
+<!-- README Version: v1.4 -->
+# FLUX.1-dev FP16
+High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.
 ## Model Description
+FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.
 **Key Capabilities**:
+- High-resolution text-to-image generation
+- Advanced prompt understanding with T5-XXL text encoder
+- Superior detail and coherence in generated images
+- Wide range of artistic styles and subjects
+- Multi-text encoder architecture (CLIP + T5)
 ## Repository Contents
 ```
 flux-dev-fp16/
 ├── checkpoints/flux/
+│   └── flux1-dev-fp16.safetensors          # 23 GB - Complete model checkpoint
 ├── clip/
+│   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL text encoder
 ├── clip_vision/
+│   └── clip_vision_h.safetensors           # CLIP vision encoder
+├── diffusion_models/flux/
+│   └── flux1-dev-fp16.safetensors          # 23 GB - Diffusion model
+├── text_encoders/
+│   ├── clip-vit-large.safetensors          # 1.6 GB - CLIP ViT-Large encoder
+│   ├── clip_g.safetensors                  # 1.3 GB - CLIP-G encoder
+│   ├── clip_l.safetensors                  # 235 MB - CLIP-L encoder
+│   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL encoder
 └── vae/flux/
+    └── flux-vae-bf16.safetensors           # 160 MB - VAE decoder (BF16)
+Total Size: ~72 GB
 ```
 ## Hardware Requirements
+### Minimum Requirements
+- **VRAM**: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
 - **RAM**: 32 GB system memory
+- **Disk Space**: 80 GB free space
+- **GPU**: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
+### Recommended Requirements
+- **VRAM**: 32+ GB (RTX 6000 Ada, A6000, H100)
 - **RAM**: 64 GB system memory
+- **Disk Space**: 100+ GB for workspace and outputs
+- **GPU**: NVIDIA RTX 4090 or professional GPUs
+### Performance Notes
+- FP16 precision provides best quality but highest VRAM usage
+- Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory)
+- Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)
 ## Usage Examples
+### Using with Diffusers Library
 ```python
 import torch
 from diffusers import FluxPipeline
+# Load the pipeline with local model files
+pipe = FluxPipeline.from_pretrained(
+    "E:/huggingface/flux-dev-fp16",
     torch_dtype=torch.float16
 )
+pipe = pipe.to("cuda")
 # Generate an image
+prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
 image = pipe(
     prompt=prompt,
     num_inference_steps=50,
 image.save("output.png")
 ```
+### Using with ComfyUI
+1. Copy model files to ComfyUI directories:
+   - `checkpoints/flux/flux1-dev-fp16.safetensors` → `ComfyUI/models/checkpoints/`
+   - `text_encoders/*.safetensors` → `ComfyUI/models/clip/`
+   - `vae/flux/flux-vae-bf16.safetensors` → `ComfyUI/models/vae/`
+2. In ComfyUI:
+   - Load Checkpoint: Select `flux1-dev-fp16`
+   - Text Encoder: Automatically loaded
+   - VAE: Select `flux-vae-bf16`
+### Using Individual Components
 ```python
+from diffusers import AutoencoderKL
+from transformers import T5EncoderModel, CLIPTextModel
+# Load text encoders
+t5_encoder = T5EncoderModel.from_pretrained(
     "E:/huggingface/flux-dev-fp16/text_encoders",
+    torch_dtype=torch.float16,
+    filename="t5xxl_fp16.safetensors"
 )
+clip_encoder = CLIPTextModel.from_pretrained(
     "E:/huggingface/flux-dev-fp16/text_encoders",
+    torch_dtype=torch.float16,
+    filename="clip_l.safetensors"
 )
+# Load VAE
+vae = AutoencoderKL.from_pretrained(
+    "E:/huggingface/flux-dev-fp16/vae/flux",
+    torch_dtype=torch.bfloat16,
+    filename="flux-vae-bf16.safetensors"
 )
 ```
+## Model Specifications
+**Architecture**:
+- **Type**: Latent Diffusion Transformer
+- **Parameters**: ~12B (diffusion model)
+- **Text Encoders**:
+  - T5-XXL: 4.7B parameters (FP16)
+  - CLIP-G: 1.3B parameters
+  - CLIP-L: 235M parameters
+- **VAE**: BF16 precision (160M parameters)
+**Precision**:
+- **Diffusion Model**: FP16 (float16)
+- **Text Encoders**: FP16 (float16)
+- **VAE**: BF16 (bfloat16)
+**Format**:
+- `.safetensors` - Secure tensor format with fast loading
+**Resolution Support**:
+- Native: 1024x1024
+- Range: 512x512 to 2048x2048
+- Aspect ratios: Supports non-square resolutions
+## Performance Tips
+### Memory Optimization
+```python
+# Enable memory efficient attention
+pipe.enable_attention_slicing()
+# Enable VAE tiling for high resolutions
+pipe.enable_vae_tiling()
+# Use CPU offloading if VRAM limited (slower)
+pipe.enable_sequential_cpu_offload()
 ```
+### Speed Optimization
+```python
+# Use torch.compile for faster inference (PyTorch 2.0+)
+pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+# Reduce inference steps (trade quality for speed)
+image = pipe(prompt, num_inference_steps=25)  # Default is 50
+```
+### Quality Optimization
+- Use 50-75 inference steps for best quality
+- Guidance scale: 7-9 for balanced results
+- Higher guidance (10-15) for stronger prompt adherence
+- Consider prompt engineering for better results
 ## License
+This model is released under the **Apache 2.0 License**.
 **Usage Terms**:
+- ✅ Commercial use allowed
+- ✅ Modification and redistribution allowed
+- ✅ Patent use allowed
+- ⚠️ Requires attribution to Black Forest Labs
+See the LICENSE file for full terms.
 ## Citation
+If you use this model in your research or projects, please cite:
 ```bibtex
+@misc{flux-dev,
+  title={FLUX.1-dev: High-Quality Text-to-Image Generation},
+  author={Black Forest Labs},
+  year={2024},
+  howpublished={\url{https://blackforestlabs.ai/}}
 }
 ```
+## Related Resources
+- **Official Website**: https://blackforestlabs.ai/
+- **Model Card**: https://huggingface.co/black-forest-labs/FLUX.1-dev
+- **Documentation**: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
+- **Community**: https://huggingface.co/black-forest-labs
+## Version Information
+- **Model Version**: FLUX.1-dev
+- **Precision**: FP16
+- **Release**: 2024
+- **README Version**: v1.4
 ---
+For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/`