TokForge β SD1.5 IP-Adapter (Reference Identity) bundle
The reference-identity image route for the TokForge Android app. Attach a photo of a person, then render that person in any scene ("me as a superhero flying over New York"). The plus-face IP-Adapter transfers the face only while the prompt drives the whole scene.
This bundle runs on the on-device stable-diffusion.cpp
GGUF engine (TokForge's IP-Adapter port) on CPU and Adreno OpenCL. SD1.5 is
light enough for any 8 GB+ phone β the broadest-reach identity tier (lighter than the
SDXL PhotoMaker tier).
Files
| File | Size | License | Contents |
|---|---|---|---|
sd15-base-f16.gguf |
~2.2 GB | CreativeML-OpenRAIL-M | DreamShaper-7 (SD1.5 realistic finetune) β CLIP text encoder + UNet + VAE in one f16 GGUF |
ip-adapter-plus-face_sd15.safetensors |
~98 MB | Apache-2.0 | IP-Adapter plus-face (h94/IP-Adapter) β 16-token Resampler + decoupled cross-attn |
ip_adapter_clip_vision_vith.safetensors |
~2.5 GB | MIT | OpenCLIP ViT-H-14 image encoder (the plus-face path needs ViT-H, not bigG) |
manifest.json and MD5SUMS carry the integrity hashes + render defaults.
Why this base, and why f16 (not Q4)
The base is the standard, non-LCM DreamShaper-7 β the same realistic SD1.5 finetune
TokForge ships on its other image tiers. It is converted at f16 (full precision) so
the IP-Adapter's decoupled cross-attention and the face Resampler keep subject quality
high. A q4_0/emaonly base measurably weakens the transferred identity, so this bundle
deliberately uses f16.
Why plus-face (not the base adapter)
The base ip-adapter_sd15 projects the whole pooled CLIP embedding (4 tokens) β it
drags the reference's entire scene through (a car selfie came out "the person in his car").
The plus-face Resampler extracts the face only (16 tokens from the ViT-H penultimate
hidden state) β identity is preserved while the prompt controls the scene. The TokForge
sd.cpp IP-Adapter loader auto-detects plus-face by the presence of image_proj.latents.
How TokForge uses it
In the app: Image model picker β download "SD1.5 IP-Adapter (Reference Identity)" β attach a face photo as a reference under chat β prompt the scene. The engine is invoked as:
sd -M img_gen \
-m sd15-base-f16.gguf \
-p "as a superhero flying over New York" \
-n "<strong negative>" \
--clip_vision ip_adapter_clip_vision_vith.safetensors \
--ip-adapter ip-adapter-plus-face_sd15.safetensors \
--ip-adapter-image <your_face.jpg> \
--ip-adapter-scale 0.6 \
--cfg-scale 7.0 --sampling-method euler_a --scheduler discrete \
--steps 30 -H 512 -W 512
Recommended render settings
| Setting | Value |
|---|---|
| sampler | euler_a |
| scheduler | discrete |
| steps | 30 (full quality; fewer = faster) |
| cfg-scale | 7.0 |
| ip-adapter-scale | 0.6 (β0.5β0.6 keeps the scene with recognizable identity; ~0.8 reconstructs the reference) |
| resolution | 512Γ512 (SD1.5 native) |
Licenses
This is an aggregate of three independently-licensed components β each retains its own license:
- DreamShaper-7 base (
sd15-base-f16.gguf) β CreativeML-OpenRAIL-M (Lykon/dreamshaper-7). Use must comply with the OpenRAIL-M use-based restrictions. - IP-Adapter plus-face (
ip-adapter-plus-face_sd15.safetensors) β Apache-2.0 (h94/IP-Adapter). - OpenCLIP ViT-H-14 image encoder (
ip_adapter_clip_vision_vith.safetensors) β MIT (OpenCLIP / LAION ViT-H-14).
The non-commercial IP-Adapter-FaceID / InsightFace path is NOT used here β only the Apache-2.0 base + plus-face adapters from
h94/IP-Adapter.
Provenance
- Base converted from
Lykon/dreamshaper-7(diffusers) to a single f16 GGUF via the TokForgestable-diffusion.cppconvert path (-M convert --type f16). - Adapter + image encoder copied verbatim from
h94/IP-Adapter(models/ip-adapter-plus-face_sd15.safetensors,models/image_encoder/model.safetensors).
- Downloads last month
- 17
16-bit
Model tree for darkmaniac7/TokForge-SD15-IPAdapter
Base model
Lykon/dreamshaper-7