The json test train was highly successful. This merits a large expansion.

Installing dependencies…
  done.
HF token: secrets
WARNING:torchao:Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.10.0+cu128).
Unable to import `torchao` Tensor objects. This may affect loading checkpoints serialized with `torchao`
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_validators.py:205: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore.
  warnings.warn(
GPU: NVIDIA A100-SXM4-80GB

Loading VAE + CLIP from stable-diffusion-v1-5/stable-diffusion-v1-5…
config.json: 100%
 547/547 [00:00<00:00, 69.1kB/s]
vae/diffusion_pytorch_model.safetensors: 100%
 335M/335M [00:02<00:00, 378MB/s]
tokenizer_config.json: 100%
 806/806 [00:00<00:00, 112kB/s]
vocab.json: 
 1.06M/? [00:00<00:00, 41.2MB/s]
merges.txt: 
 525k/? [00:00<00:00, 39.4MB/s]
special_tokens_map.json: 100%
 472/472 [00:00<00:00, 67.1kB/s]
config.json: 100%
 617/617 [00:00<00:00, 72.3kB/s]
text_encoder/model.safetensors: 100%
 492M/492M [00:02<00:00, 493MB/s]
Loading weights: 100%
 196/196 [00:00<00:00, 3476.41it/s]
✓ VAE + CLIP loaded

Streaming test rows from AbstractPhil/synthetic-object-relations-json…
✓ 6 test rows

──── before (base lune)  (AbstractPhil/sd15-flow-lune-flux/flux_t2_6_pose_t4_6_port_t1_4/checkpoint-00018765/unet) ────
config.json: 
 1.83k/? [00:00<00:00, 211kB/s]
flux_t2_6_pose_t4_6_port_t1_4/checkpoint(…): 100%
 3.44G/3.44G [00:12<00:00, 524MB/s]
  ✓ generated 6 images (conditioned on json_prompt)

──── after: prompt-JSON  (AbstractPhil/sd15-flow-lune-json-prompt/checkpoint-00002500/unet) ────
config.json: 
 1.83k/? [00:00<00:00, 219kB/s]
checkpoint-00002500/unet/diffusion_pytor(…): 100%
 3.44G/3.44G [00:21<00:00, 390MB/s]
  ✓ generated 6 images (conditioned on json_prompt)

──── after: vit-JSON  (AbstractPhil/sd15-flow-lune-json-vit/checkpoint-00002000/unet) ────
config.json: 
 1.83k/? [00:00<00:00, 213kB/s]
checkpoint-00002000/unet/diffusion_pytor(…): 100%
 3.44G/3.44G [00:16<00:00, 388MB/s]
  ✓ generated 6 images (conditioned on vit_json_prompt)

Saved grid: /content/lune_before_after.png


────────────────────────────────────────────────────────────────────────
 Test prompts (row order)
────────────────────────────────────────────────────────────────────────
  #0  banana on pergola
     json_prompt    : {"subjects":[{"name":"banana"},{"name":"pergola"},{"name":"spirit photography"}],"actions":["on pergola"],"setting":"outdoor"}
     vit_json_prompt: {"subjects":[{"name":"bananas","attributes":["ripe","yellow","bunch"]},{"name":"trellis","attributes":["wooden"]},{"name":"foliage","attributes":["lush","green"]}],"actions":["hangs from a wooden trellis","surrounded by lush green foliage"],"setting":"outdoor"}
  #1  tomato next to banana
     json_prompt    : {"subjects":[{"name":"tomato","attributes":["highly detailed"]},{"name":"banana","attributes":["highly detailed"]}],"actions":["next to banana"],"setting":"unknown"}
     vit_json_prompt: {"subjects":[{"name":"tomato","attributes":["ripe","red"]},{"name":"zucchini","attributes":["partially peeled","green"]},{"name":"banana","attributes":["yellow"]},{"name":"orange","attributes":["halved"]},{"name":"wooden surface"},{"name":"lighting effect","attributes":["sunlit","artistic"]}],"actions":["arranged on a wooden surface"],"setting":"unknown"}
  #3  blueberry beside pepper
     json_prompt    : {"subjects":[{"name":"blueberry"},{"name":"pepper"}],"actions":["beside pepper"],"setting":"unknown"}
     vit_json_prompt: {"subjects":[{"name":"blueberries","attributes":["fresh"]},{"name":"bell pepper","attributes":["red"]},{"name":"leaves","attributes":["green","a few"]}],"actions":["close-up of fresh blueberries and a red bell pepper"],"setting":"unknown"}
  #4  raspberry beside squash on virtual reality platform
     json_prompt    : {"subjects":[{"name":"raspberry"},{"name":"squash"},{"name":"virtual reality platform","attributes":["stylized","h 704"]}],"actions":["beside squash on virtual reality platform"],"setting":"unknown"}
     vit_json_prompt: {"subjects":[{"name":"pumpkin","attributes":["golden-orange","dark stem"]},{"name":"leaves","attributes":["fresh","green"]},{"name":"raspberries","attributes":["bright red","two"]},{"name":"surface","attributes":["dark"]}],"actions":["surrounded by fresh green leaves and two bright red raspberries on a dark surface"],"setting":"unknown"}
  #5  white dress caught on divider
     json_prompt    : {"subjects":[{"name":"dress","attributes":["white"]},{"name":"divider"}],"actions":["caught on divider"],"setting":"unknown"}
     vit_json_prompt: {"subjects":[{"name":"wedding dress"},{"name":"hanger","attributes":["white"]},{"name":"fabric drapes","attributes":["cascading","pink"]},{"name":"curtain backdrop","attributes":["sheer"]}],"actions":["hangs on a white hanger","surrounded by cascading pink fabric drapes and a sheer curtain backdrop"],"setting":"indoor"}
  #6  cotton light fixture
     json_prompt    : {"subjects":[{"name":"light fixture","attributes":["cotton"]},{"name":"cosmic apocalypse"}],"actions":[],"setting":"unknown"}
     vit_json_prompt: {"subjects":[{"name":"Edison light bulb","attributes":["warm","glowing"]},{"name":"night sky","attributes":["dark","starry"]},{"name":"cotton buds"},{"name":"stars","attributes":["twinkling"]}],"actions":["hangs from a dark, starry night sky","surrounded by cotton buds and twinkling stars"],"setting":"outdoor"}

Reading the grid:
  • 'before' conditioned on JSON it never trained on — expect incoherent or
    prompt-ignoring output. That is the baseline the finetune has to beat.
  • 'after' columns should track the target's content. prompt-JSON vs vit-JSON
    shows whether image-aligned conditioning produced the better model.
  • All models share the same initial noise (SEED), so differences are weights,
    not luck.
Downloads last month: -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support