FLUX Packaging LoRA — Indian Snacks Domain

A LoRA adaptation of FLUX.1-schnell fine-tuned for the Indian snack packaging visual domain.

Part of the MSc dissertation "Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models: A Comparative Study of SDXL and FLUX with LoRA and IP-Adapter Conditioning" — University of Stirling, MSc Artificial Intelligence, 2026.

Model details

Two LoRA checkpoints are provided:

File	Use	Rank	Steps	Resolution
`flux_packaging_lora_r16_res1024_steps2000.safetensors`	Primary — used for the SDXL-vs-FLUX comparison in the dissertation	16	2000	1024 × 1024
`flux_packaging_lora_r16_res512_steps1000.safetensors`	Supplementary — produced as a robustness check during infrastructure resolution	16	1000	512 × 512

Shared training configuration:

Property	Value
Base model	`black-forest-labs/FLUX.1-schnell`
Learning rate	5e-5
Trigger token	`ipsnackpkg`
Precision	bfloat16
Training hardware	NVIDIA A100 (40 GB) on Google Colab Pro
Wall-clock training time (primary)	≈ 3 h 40 min

The FLUX learning rate (5e-5) is lower than the SDXL counterpart (1e-4) to account for FLUX's greater sensitivity to gradient magnitude.

Pinned dependency configuration

FLUX LoRA training in the diffusers ecosystem required pinning a specific dependency set due to incompatibilities on the diffusers main branch:

diffusers==0.32.0

transformers==4.45.2

peft==0.13.2

accelerate==1.1.1

Reproducing training requires this pinned set; see the dissertation methodology log for full context.

Training data

311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence). Identical training corpus to the SDXL counterpart LoRA. Per-image provenance is preserved in the code repository as data/packaging_metadata.csv.

Intended use

Research use in studying base-model contribution to packaging-domain image generation. The dissertation's RQ1 asks whether fine-tuned FLUX produces superior packaging generation compared to fine-tuned SDXL under comparable LoRA configurations. This model is the FLUX side of that comparison.

How to use

from diffusers import FluxPipeline
import torch

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16,
).to("cuda")

pipe.load_lora_weights(
    "Vclord/flux-packaging-lora-indian-snacks",
    weight_name="flux_packaging_lora_r16_res1024_steps2000.safetensors",
)
pipe.set_adapters(["default_0"], adapter_weights=[0.5])

prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet"
image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0.0,
    width=1024,
    height=1024,
    max_sequence_length=256,
).images[0]
image.save("output.png")

Recommended LoRA scale: 0.5

Why 0.5 and not 1.0?

Unlike SDXL LoRAs which are conventionally used at scale 1.0, this FLUX LoRA operates best at scale 0.5. A diagnostic comparison at scales 0.3, 0.5, and 1.0 confirmed that scale 1.0 over-asserts on FLUX outputs, producing hazy ghosted packets — a known phenomenon in the FLUX LoRA community. Scale 0.5 preserves the trained LoRA contribution without inducing the over-assertion failure mode.

Evaluation

The FLUX vs SDXL comparison was conducted as a LoRA-only experiment (no IP-Adapter, no ControlNet) because mature FLUX equivalents of those components were not available at the time of writing. The comparison therefore answers a narrower sub-question of RQ1: whether FLUX is a better base model for the packaging-domain LoRA task in isolation.

Quantitative metrics across 24 comparison images (3 prompts × 2 seeds × 4 conditions):

Configuration	CLIP-img	CLIP-txt	LPIPS
SDXL baseline (no LoRA)	—	—	—
SDXL + LoRA + Plus + ControlNet (full pipeline)	0.552	0.320	0.782
FLUX baseline (no LoRA)	0.475	0.255	0.795
FLUX + LoRA at scale 0.5	0.528	0.306	0.665

Intra-rater reliability for the FLUX comparison spike (n = 24), Cohen's weighted kappa with linear weights:

Axis	κ
Text legibility	0.740
Packaging plausibility	0.559
Visual quality	0.554

(Regional appropriateness was not scored for this spike because the FLUX comparison prompts were not folk-art conditioned.)

Headline finding: FLUX + LoRA at scale 0.5 achieves the lowest LPIPS distance to real packaging across all configurations tested, suggesting base-model choice contributes more to packaging-domain quality than the specific fine-tuning strategy. This finding is bounded by the LoRA-only comparison scope; the full-pipeline comparison is future work.

Limitations

LoRA-only configuration; no IP-Adapter or ControlNet conditioning is applied during inference with this model. Folk-art style transfer is not part of the FLUX pipeline at the time of writing.
Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
The 1024-resolution LoRA is the primary deliverable; the 512-resolution LoRA was produced during infrastructure resolution and behaves similarly at scale 0.5 but is not the main artefact
Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion

Citation

If you use this LoRA in research, please cite:

@mastersthesis{chandra2026folkart,
  title  = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
  author = {Chandra, Vivek},
  year   = {2026},
  school = {University of Stirling},
  type   = {MSc Dissertation, Artificial Intelligence}
}

Companion repository and SDXL counterpart

Full code: https://github.com/Vclord/folk-art-packaging-generation
SDXL counterpart LoRA: https://huggingface.co/Vclord/sdxl-packaging-lora-indian-snacks

Licence

apache-2.0

Downloads last month: 16

Model tree for Vclord/flux-packaging-lora-indian-snacks

Base model

black-forest-labs/FLUX.1-schnell

Adapter

(281)

this model