Instructions to use stephenbtl/ugly-kontext-klein-4b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use stephenbtl/ugly-kontext-klein-4b-lora with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.2-klein-base-4B", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("stephenbtl/ugly-kontext-klein-4b-lora") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
Ugly Kontext β a FLUX.2 Klein 4B image-edit LoRA
A small image-edit LoRA trained on top of FLUX.2-klein-base-4B that transforms photos of cats, dogs, and small animal groups into deliberately crude "ugly sketch" versions. Trained on 120 paired (input photo, output sketch) examples using ai-toolkit.
Quick start
import torch
from PIL import Image
from diffusers import Flux2KleinPipeline
pipe = Flux2KleinPipeline.from_pretrained(
"black-forest-labs/FLUX.2-klein-4B",
torch_dtype=torch.bfloat16,
).to("cuda")
pipe.load_lora_weights(
"stephenbtl/ugly-kontext-klein-4b-lora",
weight_name="ugly_kontext_klein_4b_v1.safetensors",
adapter_name="ugly",
)
pipe.set_adapters(["ugly"], adapter_weights=[1.0])
reference = Image.open("your_pet.jpg").convert("RGB").resize((1024, 1024))
result = pipe(
prompt="change the photo the cat into an ugly sketch of the same cat",
image=reference,
height=1024, width=1024,
num_inference_steps=4,
guidance_scale=4.0,
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
result.save("ugly.jpg")
The LoRA is trained on the base model but runs on the distilled FLUX.2-klein-4B (4 steps) as shown above. Applying the LoRA on the distilled model is faster and typically gives better results than the base model.
Klein expects
widthandheightdivisible by 16, with(WΒ·H)/256 β€ 4096.
Edit LoRAs in one paragraph
A text-to-image LoRA teaches the model an adjective β a style or concept β from (image, description) pairs and renders from prompt alone. An image-edit LoRA teaches the model a verb β a transformation β from (input image, output image, edit instruction) triples; at inference you pass both an image and a prompt, and the model applies the trained transformation while preserving the input's identity. Klein supports both modes in the same pipeline: Flux2KleinPipeline is text-to-image when you don't pass an image=, and image-edit when you do.
Repo contents
| Path | What it is |
|---|---|
ugly_kontext_klein_4b_v1.safetensors |
Final LoRA β step 2000. |
checkpoints/*.safetensors |
Intermediate checkpoints at 250-step intervals (250 β 1750). Useful for picking a sweet spot before over-training. |
samples/ |
27 training-time samples β three prompts (cat / dog / animal group) at every checkpoint. Pure text-to-image (no control image), so they show style progression rather than edit behaviour. |
Training details
| Base model | black-forest-labs/FLUX.2-klein-base-4B (Apache 2.0) |
| Trainer | ostris/ai-toolkit, arch: flux2_klein_4b |
| Dataset | 120 paired (reference, target, caption) examples β cats, dogs, small animal groups |
| Network | LoRA, linear: 128, linear_alpha: 64, conv: 64, conv_alpha: 32 |
| Optimiser | adamw8bit, lr: 1e-4, weight_decay: 1.5e-4 |
| Scheduler | flow-match, timestep_type: shift, content_or_style: balanced |
| Resolution | 512Β² (matches dataset native resolution) |
| Steps | 2000, batch_size: 1, gradient checkpointing |
| Hardware | RTX 4090 (24 GB), quantize: true |
| Wall time | ~35 min |
Tips
- Match the dataset's prompt shape. Captions follow
change the photo the X into an ugly sketch of the same X. Sticking close to that wording produces the most reliable activation; very different phrasings work less well because the dataset has only 4 unique caption strings. - LoRA scale 0.7β1.0 is the sweet spot. Below 0.5 you keep more of Klein-base's polish; above 1.0 the transformation dominates and you start to amplify dataset quirks (extra eyes, etc.).
- Try the step-1500 checkpoint before reaching for the final. Edit LoRAs over-train fast on small datasets, and earlier checkpoints often preserve subject identity better.
Limitations
- Caption monotony. Only 4 unique caption strings across 120 examples. The model learns the transformation well, but generalisation to differently-worded prompts is limited.
- Subject distribution. Cats, dogs, and small animal groups dominate the training data. People, buildings, food, and other subjects work to varying degrees but show identity drift more often.
- Aesthetic. The "ugly" look is intentional and quirky β it's not a polished sketch style.
- Licence inheritance. The LoRA inherits Apache 2.0 from the base model.
Licence
Apache 2.0 (inherits from FLUX.2-klein-base-4B).
- Downloads last month
- 1
Model tree for stephenbtl/ugly-kontext-klein-4b-lora
Base model
black-forest-labs/FLUX.2-klein-base-4B