FLUX Packaging LoRA β€” Indian Snacks Domain

A LoRA adaptation of FLUX.1-schnell fine-tuned for the Indian snack packaging visual domain.

Part of the MSc dissertation "Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models: A Comparative Study of SDXL and FLUX with LoRA and IP-Adapter Conditioning" β€” University of Stirling, MSc Artificial Intelligence, 2026.

Model details

Two LoRA checkpoints are provided:

File Use Rank Steps Resolution
flux_packaging_lora_r16_res1024_steps2000.safetensors Primary β€” used for the SDXL-vs-FLUX comparison in the dissertation 16 2000 1024 Γ— 1024
flux_packaging_lora_r16_res512_steps1000.safetensors Supplementary β€” produced as a robustness check during infrastructure resolution 16 1000 512 Γ— 512

Shared training configuration:

Property Value
Base model black-forest-labs/FLUX.1-schnell
Learning rate 5e-5
Trigger token ipsnackpkg
Precision bfloat16
Training hardware NVIDIA A100 (40 GB) on Google Colab Pro
Wall-clock training time (primary) β‰ˆ 3 h 40 min

The FLUX learning rate (5e-5) is lower than the SDXL counterpart (1e-4) to account for FLUX's greater sensitivity to gradient magnitude.

Pinned dependency configuration

FLUX LoRA training in the diffusers ecosystem required pinning a specific dependency set due to incompatibilities on the diffusers main branch:

diffusers==0.32.0

transformers==4.45.2

peft==0.13.2

accelerate==1.1.1

Reproducing training requires this pinned set; see the dissertation methodology log for full context.

Training data

311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence). Identical training corpus to the SDXL counterpart LoRA. Per-image provenance is preserved in the code repository as data/packaging_metadata.csv.

Intended use

Research use in studying base-model contribution to packaging-domain image generation. The dissertation's RQ1 asks whether fine-tuned FLUX produces superior packaging generation compared to fine-tuned SDXL under comparable LoRA configurations. This model is the FLUX side of that comparison.

How to use

from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16,
).to("cuda")

pipe.load_lora_weights(
    "Vclord/flux-packaging-lora-indian-snacks",
    weight_name="flux_packaging_lora_r16_res1024_steps2000.safetensors",
)
pipe.set_adapters(["default_0"], adapter_weights=[0.5])

prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet"
image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0.0,
    width=1024,
    height=1024,
    max_sequence_length=256,
).images[0]
image.save("output.png")

Recommended LoRA scale: 0.5

Why 0.5 and not 1.0?

Unlike SDXL LoRAs which are conventionally used at scale 1.0, this FLUX LoRA operates best at scale 0.5. A diagnostic comparison at scales 0.3, 0.5, and 1.0 confirmed that scale 1.0 over-asserts on FLUX outputs, producing hazy ghosted packets β€” a known phenomenon in the FLUX LoRA community. Scale 0.5 preserves the trained LoRA contribution without inducing the over-assertion failure mode.

Evaluation

The FLUX vs SDXL comparison was conducted as a LoRA-only experiment (no IP-Adapter, no ControlNet) because mature FLUX equivalents of those components were not available at the time of writing. The comparison therefore answers a narrower sub-question of RQ1: whether FLUX is a better base model for the packaging-domain LoRA task in isolation.

Quantitative metrics across 24 comparison images (3 prompts Γ— 2 seeds Γ— 4 conditions):

Configuration CLIP-img CLIP-txt LPIPS
SDXL baseline (no LoRA) β€” β€” β€”
SDXL + LoRA + Plus + ControlNet (full pipeline) 0.552 0.320 0.782
FLUX baseline (no LoRA) 0.475 0.255 0.795
FLUX + LoRA at scale 0.5 0.528 0.306 0.665

Intra-rater reliability for the FLUX comparison spike (n = 24), Cohen's weighted kappa with linear weights:

Axis ΞΊ
Text legibility 0.740
Packaging plausibility 0.559
Visual quality 0.554

(Regional appropriateness was not scored for this spike because the FLUX comparison prompts were not folk-art conditioned.)

Headline finding: FLUX + LoRA at scale 0.5 achieves the lowest LPIPS distance to real packaging across all configurations tested, suggesting base-model choice contributes more to packaging-domain quality than the specific fine-tuning strategy. This finding is bounded by the LoRA-only comparison scope; the full-pipeline comparison is future work.

Limitations

  • LoRA-only configuration; no IP-Adapter or ControlNet conditioning is applied during inference with this model. Folk-art style transfer is not part of the FLUX pipeline at the time of writing.
  • Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
  • The 1024-resolution LoRA is the primary deliverable; the 512-resolution LoRA was produced during infrastructure resolution and behaves similarly at scale 0.5 but is not the main artefact
  • Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion

Citation

If you use this LoRA in research, please cite:

@mastersthesis{chandra2026folkart,
  title  = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
  author = {Chandra, Vivek},
  year   = {2026},
  school = {University of Stirling},
  type   = {MSc Dissertation, Artificial Intelligence}
}

Companion repository and SDXL counterpart

Licence

apache-2.0

Downloads last month
16
Inference Providers NEW

Model tree for Vclord/flux-packaging-lora-indian-snacks

Adapter
(281)
this model