TokForge β SD1.5 NPU "Fast" (DreamShaper-7, Qualcomm Hexagon)
SD1.5 image generation for the Qualcomm Hexagon NPU (HTP), packaged for on-device image
generation in the TokForge Android app (dev.tokforge). This is the "Fast" tier β the
quickest on-device image route (coherent 512Γ512 in ~9β16 s, no root).
The shipping checkpoint is DreamShaper-7 (Lykon/dreamshaper-7), a license-clean
(CreativeML-OpenRAIL-M) SD1.5 finetune. It is a quality + composition upgrade over base
Stable Diffusion 1.5: stronger aesthetics and better prompt-following on compositional prompts
(e.g. "two robots playing chess" β base drops a robot, DreamShaper renders both). Same SD1.5
architecture and same temb NPU IO contract as base SD1.5 β only the UNet, the VAE-decoder, and
the DreamShaper-specific time-embedding MLP weights differ; the CPU CLIP front-end is bit-identical
to base SD1.5 and is reused.
The model is quantized to W8A16 (8-bit weights, 16-bit activations) and compiled to QNN HTP context binaries that run image generation directly on the phone's Hexagon DSP.
Based on
Lykon/dreamshaper-7 β itself an SD1.5 finetune of
stable-diffusion-v1-5/stable-diffusion-v1-5.
Format
These are per-architecture QNN HTP context binaries, one set per Hexagon arch (V73, V75, V79, V81). They are not a portable format like GGUF β each binary is compiled for a specific Hexagon generation. The app reads the device's Hexagon arch and selects the matching set.
Binaries are forward-compatible: a set built for a lower Hexagon arch also runs on a higher-arch DSP, while native-arch sets are preferred for best performance.
The DreamShaper-7 "Fast" bins live under the dreamshaper7/ directory (with their own
manifest.json); the app downloads from this variant. The repo also retains the original base
SD1.5 ours-temb sets (top-level v73/β¦v81/ + root manifest.json) and the AI-Hub off-the-shelf
SD1.5 sets for breadth.
File (per dreamshaper7/<arch>/ dir) |
Role |
|---|---|
unet.bin |
UNet HTP context binary (DreamShaper-7 W8A16) |
vae_decoder.bin |
VAE decoder HTP context binary |
text_encoder.bin |
CLIP text-encoder QNN binary |
time_mlp.bin |
host time-embedding weights (DreamShaper-specific) |
tokenizer.json, config.json |
tokenizer + pipeline config |
The arch-independent CPU CLIP front-end (clip_sd15_base/clip_v2.mnn, token_emb.bin, pos_emb.bin)
is shared by every arch and reused across the base + DreamShaper variants.
See dreamshaper7/manifest.json for the authoritative per-arch file set (with per-file size + md5)
that the app uses to download the correct binaries for the device.
Usage
This bundle is loaded automatically by the TokForge Android app β it is not a standalone diffusers checkpoint. The app resolves the device Hexagon arch from the manifest, downloads the matching binaries, and runs them on the device NPU.
License & attribution
Released under CreativeML OpenRAIL-M, matching its base models.
This model is a derivative of Lykon/dreamshaper-7,
itself a finetune of
stable-diffusion-v1-5/stable-diffusion-v1-5.
Please retain this attribution and observe the CreativeML OpenRAIL-M use restrictions.