TokForge β€” SD1.5 NPU "Fast" (DreamShaper-7, Qualcomm Hexagon)

SD1.5 image generation for the Qualcomm Hexagon NPU (HTP), packaged for on-device image generation in the TokForge Android app (dev.tokforge). This is the "Fast" tier β€” the quickest on-device image route (coherent 512Γ—512 in ~9–16 s, no root).

The shipping checkpoint is DreamShaper-7 (Lykon/dreamshaper-7), a license-clean (CreativeML-OpenRAIL-M) SD1.5 finetune. It is a quality + composition upgrade over base Stable Diffusion 1.5: stronger aesthetics and better prompt-following on compositional prompts (e.g. "two robots playing chess" β€” base drops a robot, DreamShaper renders both). Same SD1.5 architecture and same temb NPU IO contract as base SD1.5 β€” only the UNet, the VAE-decoder, and the DreamShaper-specific time-embedding MLP weights differ; the CPU CLIP front-end is bit-identical to base SD1.5 and is reused.

The model is quantized to W8A16 (8-bit weights, 16-bit activations) and compiled to QNN HTP context binaries that run image generation directly on the phone's Hexagon DSP.

Based on

Lykon/dreamshaper-7 β€” itself an SD1.5 finetune of stable-diffusion-v1-5/stable-diffusion-v1-5.

Format

These are per-architecture QNN HTP context binaries, one set per Hexagon arch (V73, V75, V79, V81). They are not a portable format like GGUF β€” each binary is compiled for a specific Hexagon generation. The app reads the device's Hexagon arch and selects the matching set.

Binaries are forward-compatible: a set built for a lower Hexagon arch also runs on a higher-arch DSP, while native-arch sets are preferred for best performance.

The DreamShaper-7 "Fast" bins live under the dreamshaper7/ directory (with their own manifest.json); the app downloads from this variant. The repo also retains the original base SD1.5 ours-temb sets (top-level v73/…v81/ + root manifest.json) and the AI-Hub off-the-shelf SD1.5 sets for breadth.

File (per dreamshaper7/<arch>/ dir) Role
unet.bin UNet HTP context binary (DreamShaper-7 W8A16)
vae_decoder.bin VAE decoder HTP context binary
text_encoder.bin CLIP text-encoder QNN binary
time_mlp.bin host time-embedding weights (DreamShaper-specific)
tokenizer.json, config.json tokenizer + pipeline config

The arch-independent CPU CLIP front-end (clip_sd15_base/clip_v2.mnn, token_emb.bin, pos_emb.bin) is shared by every arch and reused across the base + DreamShaper variants.

See dreamshaper7/manifest.json for the authoritative per-arch file set (with per-file size + md5) that the app uses to download the correct binaries for the device.

Usage

This bundle is loaded automatically by the TokForge Android app β€” it is not a standalone diffusers checkpoint. The app resolves the device Hexagon arch from the manifest, downloads the matching binaries, and runs them on the device NPU.

License & attribution

Released under CreativeML OpenRAIL-M, matching its base models.

This model is a derivative of Lykon/dreamshaper-7, itself a finetune of stable-diffusion-v1-5/stable-diffusion-v1-5. Please retain this attribution and observe the CreativeML OpenRAIL-M use restrictions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including darkmaniac7/TokForge-SD15-QNN-NPU