TokForge β€” SDXL IP-Adapter (Reference Identity) bundle

The highest-fidelity reference-identity image route for the TokForge Android app, and the clean multi-subject path. Attach a photo of a person (or two), then render that person in any scene ("me as a superhero, unmasked, face visible"). The plus-face IP-Adapter transfers the face only while the prompt drives the whole scene. SDXL's stronger plus-face transfer (vs SD1.5) gives sharper single-subject identity, and makes the regional-mask two-subject path produce two distinct recognizable faces.

This bundle runs on the on-device stable-diffusion.cpp engine (TokForge's IP-Adapter port) on CPU and Adreno OpenCL. Full SDXL is heavier than the SD1.5 IP-Adapter tier β€” this is offered on 16 GB-class phones. For 8 GB phones, use the lighter darkmaniac7/TokForge-SD15-IPAdapter tier instead.

Files

File Size License Contents
realvisxl-v40-lightning-fp16.safetensors ~6.9 GB OpenRAIL++ RealVisXL V4.0 Lightning (SDXL photoreal finetune) β€” dual CLIP text encoders + UNet + VAE in one self-contained f16 sd.cpp safetensors
ip-adapter-plus-face_sdxl_vit-h.safetensors ~848 MB Apache-2.0 IP-Adapter plus-face SDXL (h94/IP-Adapter) β€” 16-token image_proj Resampler + 70 decoupled cross-attn layers (cross_attention_dim 2048)
ip_adapter_clip_vision_vith.safetensors ~2.5 GB MIT OpenCLIP ViT-H-14 image encoder (1280 hidden, 32 layers). The plus-face path needs ViT-H, not bigG β€” the same encoder the SD1.5 plus-face bundle ships

manifest.json and MD5SUMS carry the integrity hashes + render defaults.

Why this base, and why f16 (not Q4)

The base is RealVisXL V4.0 Lightning β€” the same self-contained SDXL safetensors the TokForge "RealVisXL SDXL Quality" tier ships, a strong photoreal SDXL finetune distilled for a few-step (6-step) Lightning floor. It is kept at f16 (full precision) so the IP-Adapter's decoupled cross-attention and the face Resampler keep subject quality high. A q4_0 base measurably weakens the transferred identity, so this bundle deliberately uses f16 (matching the SD1.5 tier's quality choice).

Why plus-face + ViT-H (not the base SDXL adapter + bigG)

The standard ip-adapter_sdxl projects the whole pooled CLIP-bigG embedding β†’ it drags the reference's entire scene through. The plus-face variant (ip-adapter-plus-face_sdxl_vit-h) runs a 16-token Resampler over the ViT-H penultimate hidden state β†’ it extracts the face only, so identity is preserved while the prompt controls the scene. Because this adapter is the _vit-h build (image_proj.latents shape [1, 16, 1280]), it pairs with the ViT-H encoder (1280 hidden) β€” not the bigG encoder (1664 hidden) the base SDXL adapter uses. The TokForge sd.cpp IP-Adapter loader auto-detects plus-face by image_proj.latents and the SDXL adapter config (2048-dim, 70 layers) by the SDXL base.

How TokForge uses it

In the app (16 GB+ phones): Image model picker β†’ download "SDXL IP-Adapter (Reference Identity)" β†’ attach a face photo as a reference under chat β†’ prompt the scene. The engine is invoked as:

sd -M img_gen \
   -m realvisxl-v40-lightning-fp16.safetensors \
   -p "as a superhero, unmasked, face visible, detailed face" \
   -n "<strong negative>" \
   --clip_vision ip_adapter_clip_vision_vith.safetensors \
   --ip-adapter ip-adapter-plus-face_sdxl_vit-h.safetensors \
   --ip-adapter-image <your_face.jpg> \
   --ip-adapter-scale 0.75 \
   --cfg-scale 2.0 --sampling-method euler --scheduler discrete \
   --steps 6 -H 1024 -W 1024

Recommended render settings

Setting Value
sampler euler
scheduler discrete
steps 6 (Lightning few-step floor)
cfg-scale 2.0 (Lightning low-CFG)
ip-adapter-scale 0.75 (keeps the scene with strong recognizable identity; lower β‰ˆ more scene freedom, higher β‰ˆ closer to the reference)
resolution 1024Γ—1024 (SDXL native)

Plus-face transfers the face only, so the rendered face must stay visible and unobstructed for a recognizable identity. Keep the face in frame ("unmasked, face visible, detailed face, looking at viewer") β€” the app appends this cue automatically.

Licenses

This is an aggregate of three independently-licensed components β€” each retains its own license:

  • RealVisXL V4.0 Lightning base (realvisxl-v40-lightning-fp16.safetensors) β€” OpenRAIL++ (SG161222/RealVisXL_V4.0_Lightning, the SDXL openrail++ license). Use must comply with the OpenRAIL++ use-based restrictions.
  • IP-Adapter plus-face (SDXL, ViT-H) (ip-adapter-plus-face_sdxl_vit-h.safetensors) β€” Apache-2.0 (h94/IP-Adapter).
  • OpenCLIP ViT-H-14 image encoder (ip_adapter_clip_vision_vith.safetensors) β€” MIT (OpenCLIP / LAION ViT-H-14).

The non-commercial IP-Adapter-FaceID / InsightFace path is NOT used here β€” only the Apache-2.0 base + plus-face adapters from h94/IP-Adapter.

Provenance

  • Base: realvisxl-v40-lightning-fp16.safetensors, the self-contained sd.cpp SDXL safetensors built from SG161222/RealVisXL_V4.0_Lightning (the same base the TokForge RealVisXL SDXL tier ships).
  • Adapter + image encoder copied verbatim from h94/IP-Adapter (sdxl_models/ip-adapter-plus-face_sdxl_vit-h.safetensors, models/image_encoder/model.safetensors).
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/TokForge-SDXL-IPAdapter

Finetuned
(1)
this model