Text-to-Image
File size: 3,120 Bytes
b992d20
 
4290419
 
 
 
 
b992d20
4290419
 
926faab
4290419
ad94ebe
4290419
 
c2142e3
b992d20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2142e3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
license: apache-2.0
datasets:
- Jonathan-Zhou/GameLabel-10k
base_model:
- black-forest-labs/FLUX.1-schnell
pipeline_tag: text-to-image
---
# Flux GameLabel Lora

This model is intended purely for research purposes as a demonstration of the the quality of data labeled by random video game players. It achieves its purpose (higher prompt adherence), but suffers from a variety of issues due to being fine tuned on synthetic outputs.  

Inference code that runs on a 24GB consumer card is below. More details are in the paper at [https://arxiv.org/abs/2409.19830](https://arxiv.org/abs/2409.19830)


```python3
from diffusers import FlowMatchEulerDiscreteScheduler, AutoencoderKL
from diffusers.models.transformers.transformer_flux import FluxTransformer2DModel
from diffusers.pipelines.flux.pipeline_flux import FluxPipeline
from transformers import CLIPTextModel, CLIPTokenizer,T5EncoderModel, T5TokenizerFast
import torch
from huggingface_hub import hf_hub_download
from torchao.quantization.quant_api import (
    quantize_,
    int8_weight_only
)
dtype = torch.bfloat16
flux_repo = "black-forest-labs/FLUX.1-schnell"
revision = "refs/pr/1"

tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=dtype)
tokenizer_2 = T5TokenizerFast.from_pretrained(flux_repo, subfolder="tokenizer_2", torch_dtype=dtype, revision=revision)
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(flux_repo, subfolder="scheduler", revision=revision)
transformer = FluxTransformer2DModel.from_pretrained(flux_repo, subfolder="transformer", torch_dtype=dtype, revision=revision)
lora_file_path = hf_hub_download(repo_id = "Jonathan-Zhou/Flux-GameLabel-Lora", filename = "lora.safetensors")
text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=dtype)
text_encoder_2 = T5EncoderModel.from_pretrained(flux_repo, subfolder="text_encoder_2", torch_dtype=dtype, revision=revision)
vae = AutoencoderKL.from_pretrained(flux_repo, subfolder="vae", torch_dtype=dtype, revision=revision)


pipe = FluxPipeline(
    scheduler=scheduler,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    text_encoder_2=text_encoder_2,
    tokenizer_2=tokenizer_2,
    vae=vae,
    transformer=transformer,
)

# If you want to compare the lora with the bsae model, you can comment out these two lines
pipe.load_lora_weights(lora_file_path, adapter_name="lora1")
pipe.fuse_lora()

# Quantization needed if run on a GPU with 24 GB VRAM
quantize_(transformer, int8_weight_only()) 
quantize_(text_encoder, int8_weight_only())
quantize_(text_encoder_2, int8_weight_only())
quantize_(vae, int8_weight_only())


pipe.to("cuda")
torch.cuda.empty_cache()
generator = torch.Generator().manual_seed(12345)
output = pipe(
            prompt="a man showing off his cool new t shirt at the beach, a shark is jumping out of the water in the background", 
            width=1024,
            height=1024,
            num_inference_steps=6, 
            num_images_per_prompt = 1,
            generator=generator,
            guidance_scale=3.5,
        )
image = output.images[0]
image.show()
```