Instructions to use Vclord/sdxl-packaging-lora-indian-snacks with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Vclord/sdxl-packaging-lora-indian-snacks with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Vclord/sdxl-packaging-lora-indian-snacks") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
SDXL Packaging LoRA β Indian Snacks Domain
A LoRA adaptation of Stable Diffusion XL fine-tuned for the Indian snack packaging visual domain.
Part of the MSc dissertation "Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models: A Comparative Study of SDXL and FLUX with LoRA and IP-Adapter Conditioning" β University of Stirling, MSc Artificial Intelligence, 2026.
Model details
| Property | Value |
|---|---|
| Base model | stabilityai/stable-diffusion-xl-base-1.0 |
| Rank | 16 |
| Training steps | 2000 |
| Learning rate | 1e-4 |
| Resolution | 1024 Γ 1024 |
| Trigger token | ipsnackpkg |
| Optimizer | AdamW-8bit (via bitsandbytes) |
| Precision | bfloat16 |
| Training hardware | NVIDIA A100 (40 GB) on Google Colab Pro |
| Wall-clock training time | β 2.5 hours |
Training data
311 images of Indian snack packaging sourced from Open Food Facts (CC-BY-SA licence), manually triaged for image quality and packaging-domain fit. Per-image provenance, source URL, licence, and date retrieved are preserved in the code repository as data/packaging_metadata.csv.
All training images shared a single uniform caption containing the trigger token ipsnackpkg; per-image captioning was considered and rejected to produce a cleaner attribution of the LoRA's role in the dissertation's claim structure.
Intended use
This LoRA is intended for research use in cultural-style-transfer experiments on commercial packaging design. It forms one component of a four-component diffusion pipeline:
- Packaging-domain LoRA (this model)
- IP-Adapter Plus for folk-art style transfer (Madhubani, Tanjore, Kalighat)
- Canny ControlNet for structural conditioning
- Post-hoc PIL text compositing for regional-script labels
Outputs are concept-level and not subjected to commercial post-processing (CMYK conversion, regulatory compliance, brand-asset integration).
How to use
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
).to("cuda")
pipe.load_lora_weights(
"Vclord/sdxl-packaging-lora-indian-snacks",
weight_name="sdxl_packaging_lora_r16_steps2000.safetensors",
)
prompt = "ipsnackpkg, Front-facing product photograph of an Indian snack packet, professional product photography, white background"
image = pipe(
prompt,
num_inference_steps=25,
guidance_scale=7.5,
width=1024,
height=1024,
cross_attention_kwargs={"scale": 1.0},
).images[0]
image.save("output.png")
Recommended LoRA scale: 1.0
For the full four-component pipeline (LoRA + IP-Adapter Plus + ControlNet + text compositing), see the code repository.
Evaluation
Evaluation methodology and results are reported in the dissertation. Intra-rater reliability for the IP-Adapter variant comparison spike (n = 78), Cohen's weighted kappa with linear weights:
| Axis | ΞΊ |
|---|---|
| Text legibility | 0.844 |
| Regional appropriateness | 0.465 |
| Packaging plausibility | 0.742 |
| Visual quality | 0.857 |
Quantitative metrics at the full pipeline's chosen operating point (LoRA = 1.0, IP-Adapter Plus = 0.7, ControlNet = 0.4):
| Metric | Value | Higher = better? |
|---|---|---|
| CLIP-image similarity | 0.552 | β |
| CLIP-text similarity | 0.320 | β |
| DINOv2 style similarity | 0.245 | β |
| LPIPS perceptual distance | 0.782 | β |
Limitations
- Trained on a small dataset (311 images); generalisation beyond Indian snack packaging is not characterised
- Diffusion text rendering remains unreliable; downstream PIL text compositing is recommended for regional-script labels
- Style-transfer fidelity varies by tradition (Madhubani > Tanjore > Kalighat); Kalighat outputs reproduce canonical iconography but transfer the tradition's gesture-economy and brushwork less reliably
- Single-rater evaluation methodology with intra-rater reliability protocol; see dissertation for full discussion of evaluation limitations
Citation
If you use this LoRA in research, please cite:
@mastersthesis{chandra2026folkart,
title = {Injecting Regional Cultural Aesthetics into Product Packaging via Reference-Conditioned Diffusion Models},
author = {Chandra, Vivek},
year = {2026},
school = {University of Stirling},
type = {MSc Dissertation, Artificial Intelligence}
}
Companion repository
Full code, evaluation artefacts, methodology log, and pipeline implementation: https://github.com/Vclord/folk-art-packaging-generation
Licence
apache-2.0
- Downloads last month
- 18
Model tree for Vclord/sdxl-packaging-lora-indian-snacks
Base model
stabilityai/stable-diffusion-xl-base-1.0