JoyAI-Image-Edit-Plus (ComfyUI weights)

Single-file .safetensors checkpoints of JoyAI-Image-Edit-Plus, repackaged for native ComfyUI support (no custom node required).

JoyAI-Image-Edit-Plus is the multi-image instruction-guided editing model of the JoyAI-Image family. It accepts 1–6 reference images and a text instruction, and generates a new image that combines elements from the references according to the instruction.

Files

File Size Goes into Component
diffusion_models/joy_image_edit_plus_bf16.safetensors ~31 GB ComfyUI/models/diffusion_models/ JoyImageEditPlusTransformer3DModel (bf16)
text_encoders/qwen3vl_joyimage_bf16.safetensors ~17 GB ComfyUI/models/text_encoders/ Qwen3-VL-8B text encoder (bf16)
vae/joy_image_edit_vae.safetensors ~243 MB ComfyUI/models/vae/ AutoencoderKLWan

The repo layout already matches ComfyUI/models/, so a single hf download into your models root drops every file where it needs to go.

Model architecture

  • Transformer: 40-layer DiT, hidden size 4096, 32 heads, in/out channels 16, patch size [1, 2, 2], 3D RoPE (rope_dim_list = [16, 56, 56], theta 10000). Each reference image is patchified independently and concatenated on the sequence dimension with a per-image temporal offset in the 3D RoPE grid, so references may differ in resolution.
  • Text encoder: Qwen3VLForConditionalGeneration (text dim 4096). The instruction is wrapped with one <|vision_start|><|image_pad|><|vision_end|> block per reference image.
  • VAE: AutoencoderKLWan (z_dim 16, spatial downscale 8, temporal downscale 4) β€” the same VAE used by the single-image edit model.
  • Scheduler: FlowMatch (Euler), sampling shift 1.5.

Weight names are byte-identical to the diffusers checkpoint (894 transformer keys, zero renaming); ComfyUI auto-detects the model as joyimage.

Installation

The model runs natively in ComfyUI. Native support is proposed upstream in Comfy-Org/ComfyUI#14428; until it is merged, install the fork branch:

git clone -b joyimage-edit-pr https://github.com/feice-huang/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Once the PR is merged upstream, the stock ComfyUI release will run these weights with no fork needed.

Then download the weights straight into ComfyUI/models/:

hf download jdopensource/JoyAI-Image-Edit-Plus-ComfyUI \
  --local-dir /path/to/ComfyUI/models

Restart ComfyUI.

Usage

Example workflow: workflow_joyimage_edit.json

Build the graph from these native nodes:

  1. Load Diffusion Model (UNETLoader) β†’ diffusion_models/joy_image_edit_plus_bf16.safetensors
  2. Load CLIP (CLIPLoader) β†’ text_encoders/qwen3vl_joyimage_bf16.safetensors, type joyimage
  3. Load VAE (VAELoader) β†’ vae/joy_image_edit_vae.safetensors
  4. Load Image (LoadImage) for each reference (1–6)
  5. TextEncodeJoyImageEditPlus β€” feed clip, vae, the instruction, and the reference images into image1…image6. Wire one instance for the positive prompt and one (empty prompt, same images) for the negative. Each node bucket-resizes the references to the 1024-base buckets, VAE-encodes them, and appends the reference latents to the conditioning; its image output feeds VAEDecode / empty-latent sizing.
  6. KSampler β†’ VAEDecode β†’ SaveImage

Recommended parameters

Parameter Value
Steps 30
CFG 4.0
Sampler euler
Scheduler simple
dtype bf16
Resolution auto (1024-base buckets, per reference)

Example

Prompt: "The woman is lovingly holding the cute puppy in her arms"

Input 0 Input 1 Output
input_0 input_1 output

Model details

  • Developed by: JD.com
  • License: Apache-2.0
  • Framework: PyTorch / ComfyUI

Links

Citation

@misc{joyai-image-2025,
  title={JoyAI-Image: A Unified Multimodal Foundation Model for Image Understanding, Generation, and Editing},
  author={Joy Future Academy, JD},
  year={2025},
  url={https://github.com/jd-opensource/JoyAI-Image}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support