SDXL Packaging LoRA — Indian Snacks Domain

A LoRA adaptation of Stable Diffusion XL fine-tuned for the Indian snack packaging visual domain.

Part of the MSc dissertation "Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models: A Comparative Study of SDXL and FLUX with LoRA and IP-Adapter Conditioning" — University of Stirling, MSc Artificial Intelligence, 2026.

Model details

Property	Value
Base model	`stabilityai/stable-diffusion-xl-base-1.0`
Rank	16
Training steps	2000
Learning rate	1e-4
Resolution	1024 × 1024
Trigger token	`ipsnackpkg`
Optimizer	AdamW-8bit (via `bitsandbytes`)
Precision	bfloat16
Training hardware	NVIDIA A100 (40 GB) on Google Colab Pro
Wall-clock training time	≈ 2.5 hours

Training data

311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence), manually triaged for image quality and packaging-domain fit. Per-image provenance, source URL, licence, and date retrieved are preserved in the code repository as data/packaging_metadata.csv.

All training images shared a single uniform caption containing the trigger token ipsnackpkg; per-image captioning was considered and rejected to produce a cleaner attribution of the LoRA's role in the dissertation's claim structure.

Intended use

This LoRA is intended for research use in cultural-style-transfer experiments on commercial packaging design. It forms one component of a four-component diffusion pipeline:

Packaging-domain LoRA (this model)
IP-Adapter Plus for folk-art style transfer (Madhubani, Tanjore, Kalighat)
Canny ControlNet for structural conditioning
Post-hoc PIL text compositing for regional-script labels

Outputs are concept-level and not subjected to commercial post-processing (CMYK conversion, regulatory compliance, brand-asset integration).

How to use

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

pipe.load_lora_weights(
    "Vclord/sdxl-packaging-lora-indian-snacks",
    weight_name="sdxl_packaging_lora_r16_steps2000.safetensors",
)

prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet, professional product photography, white background"
image = pipe(
    prompt,
    num_inference_steps=25,
    guidance_scale=7.5,
    width=1024,
    height=1024,
    cross_attention_kwargs={"scale": 1.0},
).images[0]
image.save("output.png")

Recommended LoRA scale: 1.0

For the full four-component pipeline (LoRA + IP-Adapter Plus + ControlNet + text compositing), see the code repository.

Evaluation

Evaluation methodology and results are reported in the dissertation. Intra-rater reliability for the IP-Adapter variant comparison spike (n = 78), Cohen's weighted kappa with linear weights:

Axis	κ
Text legibility	0.844
Regional appropriateness	0.465
Packaging plausibility	0.742
Visual quality	0.857

Quantitative metrics at the full pipeline's chosen operating point (LoRA = 1.0, IP-Adapter Plus = 0.7, ControlNet = 0.4):

Metric	Value	Higher = better?
CLIP-image similarity	0.552	✓
CLIP-text similarity	0.320	✓
DINOv2 style similarity	0.245	✓
LPIPS perceptual distance	0.782	✗

Limitations

Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
Diffusion text rendering remains unreliable; downstream PIL text compositing is recommended for regional-script labels
Style-transfer fidelity varies by tradition (Madhubani > Tanjore > Kalighat); Kalighat outputs reproduce canonical iconography but transfer the tradition's gesture-economy and brushwork less reliably
Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion of evaluation limitations

Citation

If you use this LoRA in research, please cite:

@mastersthesis{chandra2026folkart,
  title  = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
  author = {Chandra, Vivek},
  year   = {2026},
  school = {University of Stirling},
  type   = {MSc Dissertation, Artificial Intelligence}
}

Companion repository

Full code, evaluation artefacts, methodology log, and pipeline implementation: https://github.com/Vclord/folk-art-packaging-generation

Licence

apache-2.0

Downloads last month: 18

Model tree for Vclord/sdxl-packaging-lora-indian-snacks

Base model

stabilityai/stable-diffusion-xl-base-1.0

Adapter

(9436)

this model