SDXL Packaging LoRA β€” Indian Snacks Domain

A LoRA adaptation of Stable Diffusion XL fine-tuned for the Indian snack packaging visual domain.

Part of the MSc dissertation "Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models: A Comparative Study of SDXL and FLUX with LoRA and IP-Adapter Conditioning" β€” University of Stirling, MSc Artificial Intelligence, 2026.

Model details

Property Value
Base model stabilityai/stable-diffusion-xl-base-1.0
Rank 16
Training steps 2000
Learning rate 1e-4
Resolution 1024 Γ— 1024
Trigger token ipsnackpkg
Optimizer AdamW-8bit (via bitsandbytes)
Precision bfloat16
Training hardware NVIDIA A100 (40 GB) on Google Colab Pro
Wall-clock training time β‰ˆ 2.5 hours

Training data

311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence), manually triaged for image quality and packaging-domain fit. Per-image provenance, source URL, licence, and date retrieved are preserved in the code repository as data/packaging_metadata.csv.

All training images shared a single uniform caption containing the trigger token ipsnackpkg; per-image captioning was considered and rejected to produce a cleaner attribution of the LoRA's role in the dissertation's claim structure.

Intended use

This LoRA is intended for research use in cultural-style-transfer experiments on commercial packaging design. It forms one component of a four-component diffusion pipeline:

  1. Packaging-domain LoRA (this model)
  2. IP-Adapter Plus for folk-art style transfer (Madhubani, Tanjore, Kalighat)
  3. Canny ControlNet for structural conditioning
  4. Post-hoc PIL text compositing for regional-script labels

Outputs are concept-level and not subjected to commercial post-processing (CMYK conversion, regulatory compliance, brand-asset integration).

How to use

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
).to("cuda")

pipe.load_lora_weights(
    "Vclord/sdxl-packaging-lora-indian-snacks",
    weight_name="sdxl_packaging_lora_r16_steps2000.safetensors",
)

prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet, professional product photography, white background"
image = pipe(
    prompt,
    num_inference_steps=25,
    guidance_scale=7.5,
    width=1024,
    height=1024,
    cross_attention_kwargs={"scale": 1.0},
).images[0]
image.save("output.png")

Recommended LoRA scale: 1.0

For the full four-component pipeline (LoRA + IP-Adapter Plus + ControlNet + text compositing), see the code repository.

Evaluation

Evaluation methodology and results are reported in the dissertation. Intra-rater reliability for the IP-Adapter variant comparison spike (n = 78), Cohen's weighted kappa with linear weights:

Axis ΞΊ
Text legibility 0.844
Regional appropriateness 0.465
Packaging plausibility 0.742
Visual quality 0.857

Quantitative metrics at the full pipeline's chosen operating point (LoRA = 1.0, IP-Adapter Plus = 0.7, ControlNet = 0.4):

Metric Value Higher = better?
CLIP-image similarity 0.552 βœ“
CLIP-text similarity 0.320 βœ“
DINOv2 style similarity 0.245 βœ“
LPIPS perceptual distance 0.782 βœ—

Limitations

  • Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
  • Diffusion text rendering remains unreliable; downstream PIL text compositing is recommended for regional-script labels
  • Style-transfer fidelity varies by tradition (Madhubani > Tanjore > Kalighat); Kalighat outputs reproduce canonical iconography but transfer the tradition's gesture-economy and brushwork less reliably
  • Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion of evaluation limitations

Citation

If you use this LoRA in research, please cite:

@mastersthesis{chandra2026folkart,
  title  = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
  author = {Chandra, Vivek},
  year   = {2026},
  school = {University of Stirling},
  type   = {MSc Dissertation, Artificial Intelligence}
}

Companion repository

Full code, evaluation artefacts, methodology log, and pipeline implementation: https://github.com/Vclord/folk-art-packaging-generation

Licence

apache-2.0

Downloads last month
18
Inference Providers NEW

Model tree for Vclord/sdxl-packaging-lora-indian-snacks

Adapter
(9436)
this model