WikiArt-Shards
Small conditional image generation model for abstract art-style 256×256 images.
This repository is inference-only. It contains the runtime code, model weights, label mappings, scheduler configuration, and examples needed to generate images.
Status: This model is still in training. The current weights are a very early checkpoint and are shared mainly for testing, experimentation, and reproducible inference. Output quality, conditioning accuracy, and generation stability are expected to improve in future checkpoints.
Conditions
The model supports three optional condition fields:
artistgenrestyle
Omit any field to use its ANY_* token.
Quick start
pip install -r requirements.txt
python generate.py --model_path . --artist "pablo-picasso" --genre "portrait" --style "Cubism" --scheduler ddim --num_inference_steps 50 --guidance_scale 2.0 --num_images 16 --grid --seed 42
List labels
python generate.py --model_path . --list_labels --label_limit 25
Supported generation options
--model_path Path to the cloned model repository.
--artist Artist label. Omit for ANY_ARTIST.
--genre Genre label. Omit for ANY_GENRE.
--style Style label. Omit for ANY_STYLE.
--num_images Number of images to generate.
--batch_size Images per generation batch. 0 means all images at once.
--scheduler ddpm, ddim, or dpm.
--num_inference_steps Number of denoising steps.
--guidance_scale Classifier-free guidance scale. 1.0 disables guidance.
--output_dir Directory for generated images.
--grid Save a grid image.
--grid_rows Grid rows. 0 means automatic layout.
--grid_cols Grid columns. 0 means automatic layout.
--seed Random seed. If omitted, a random seed is generated.
--device auto, cuda, or cpu.
--allow_tf32 Enable or disable TF32 on compatible NVIDIA GPUs.
--save_config Save generation_config.json.
--no-save_config Do not save generation_config.json.
--list_labels Print available label info and exit.
--label_limit Number of labels to preview with --list_labels.
Notes
This is a compact experimental image generation model. It is intended for demos, exploration, and reproducible local inference. It is not comparable to large text-to-image foundation models.
