TokForge β€” SD1.5 IP-Adapter (Reference Identity) bundle

The reference-identity image route for the TokForge Android app. Attach a photo of a person, then render that person in any scene ("me as a superhero flying over New York"). The plus-face IP-Adapter transfers the face only while the prompt drives the whole scene.

This bundle runs on the on-device stable-diffusion.cpp GGUF engine (TokForge's IP-Adapter port) on CPU and Adreno OpenCL. SD1.5 is light enough for any 8 GB+ phone β€” the broadest-reach identity tier (lighter than the SDXL PhotoMaker tier).

Files

File Size License Contents
sd15-base-f16.gguf ~2.2 GB CreativeML-OpenRAIL-M DreamShaper-7 (SD1.5 realistic finetune) β€” CLIP text encoder + UNet + VAE in one f16 GGUF
ip-adapter-plus-face_sd15.safetensors ~98 MB Apache-2.0 IP-Adapter plus-face (h94/IP-Adapter) β€” 16-token Resampler + decoupled cross-attn
ip_adapter_clip_vision_vith.safetensors ~2.5 GB MIT OpenCLIP ViT-H-14 image encoder (the plus-face path needs ViT-H, not bigG)

manifest.json and MD5SUMS carry the integrity hashes + render defaults.

Why this base, and why f16 (not Q4)

The base is the standard, non-LCM DreamShaper-7 β€” the same realistic SD1.5 finetune TokForge ships on its other image tiers. It is converted at f16 (full precision) so the IP-Adapter's decoupled cross-attention and the face Resampler keep subject quality high. A q4_0/emaonly base measurably weakens the transferred identity, so this bundle deliberately uses f16.

Why plus-face (not the base adapter)

The base ip-adapter_sd15 projects the whole pooled CLIP embedding (4 tokens) β†’ it drags the reference's entire scene through (a car selfie came out "the person in his car"). The plus-face Resampler extracts the face only (16 tokens from the ViT-H penultimate hidden state) β†’ identity is preserved while the prompt controls the scene. The TokForge sd.cpp IP-Adapter loader auto-detects plus-face by the presence of image_proj.latents.

How TokForge uses it

In the app: Image model picker β†’ download "SD1.5 IP-Adapter (Reference Identity)" β†’ attach a face photo as a reference under chat β†’ prompt the scene. The engine is invoked as:

sd -M img_gen \
   -m sd15-base-f16.gguf \
   -p "as a superhero flying over New York" \
   -n "<strong negative>" \
   --clip_vision ip_adapter_clip_vision_vith.safetensors \
   --ip-adapter ip-adapter-plus-face_sd15.safetensors \
   --ip-adapter-image <your_face.jpg> \
   --ip-adapter-scale 0.6 \
   --cfg-scale 7.0 --sampling-method euler_a --scheduler discrete \
   --steps 30 -H 512 -W 512

Recommended render settings

Setting Value
sampler euler_a
scheduler discrete
steps 30 (full quality; fewer = faster)
cfg-scale 7.0
ip-adapter-scale 0.6 (β‰ˆ0.5–0.6 keeps the scene with recognizable identity; ~0.8 reconstructs the reference)
resolution 512Γ—512 (SD1.5 native)

Licenses

This is an aggregate of three independently-licensed components β€” each retains its own license:

  • DreamShaper-7 base (sd15-base-f16.gguf) β€” CreativeML-OpenRAIL-M (Lykon/dreamshaper-7). Use must comply with the OpenRAIL-M use-based restrictions.
  • IP-Adapter plus-face (ip-adapter-plus-face_sd15.safetensors) β€” Apache-2.0 (h94/IP-Adapter).
  • OpenCLIP ViT-H-14 image encoder (ip_adapter_clip_vision_vith.safetensors) β€” MIT (OpenCLIP / LAION ViT-H-14).

The non-commercial IP-Adapter-FaceID / InsightFace path is NOT used here β€” only the Apache-2.0 base + plus-face adapters from h94/IP-Adapter.

Provenance

  • Base converted from Lykon/dreamshaper-7 (diffusers) to a single f16 GGUF via the TokForge stable-diffusion.cpp convert path (-M convert --type f16).
  • Adapter + image encoder copied verbatim from h94/IP-Adapter (models/ip-adapter-plus-face_sd15.safetensors, models/image_encoder/model.safetensors).
Downloads last month
17
GGUF
Model size
1B params
Architecture
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for darkmaniac7/TokForge-SD15-IPAdapter

Quantized
(2)
this model