Gemma 4 12B-it Assistant — Core AI (MTP draft model)

INT4 Core AI (.aimodel) conversion of google/gemma-4-12B-it-assistant, the multi-token-prediction draft model for Gemma 4 12B-it. Companion to warshanks/gemma-4-12B-it-coreai — used for speculative decoding in Wyvern Chat's on-device Core AI provider (macOS 27+, Apple silicon).

Single inference function draft:

	name	shape	dtype
input	`input_ids`	[1, 1]	int32
input	`backbone_hidden`	[1, 1, 3840]	bf16
input	`position_ids`	[1, S]	int32
state	`k_cache` / `v_cache`	[48, 1, 8, ctx, 256]	bf16
output	`next_token`	[1, 1]	int32
output	`hidden`	[1, 1, 3840]	bf16

The draft cross-attends the main model's KV cache (layers 46/47) — pass the same Metal buffers used for the main bundle, zero-copy. backbone_hidden is the main model's post-final-norm hidden state (the hidden output of the main bundle's main/prefill_multimodal functions). The bundle embeds an INT4 copy of the main model's embedding table, so each draft step needs only the previous token id, not its embedding.

Drafted greedily (in-graph argmax over [0, 255999) — special/multimodal tokens are never proposed). ~3 ms/step on an M4 Max vs ~21 ms for the 12B.

Conversion

Exported with Apple's coreai-torch / coreai-models toolchain (INT4 block-32 weight-only, symmetric with clipping). Numerics verified against the HF reference implementation (logits max-abs-err 3e-4 at S=1500).

Modifications from the original weights: INT4 quantization; the unused centroid masked-embedder path is dropped (use_ordered_embeddings: false); the main model's embedding table is bundled in.

License

Apache 2.0, same as the base model. Original copyright Google DeepMind. See LICENSE and NOTICE.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for warshanks/gemma-4-12B-it-assistant-coreai

Base model

google/gemma-4-12B-it-assistant

Finetuned

(5)

this model

warshanks
/

gemma-4-12B-it-assistant-coreai

Gemma 4 12B-it Assistant — Core AI (MTP draft model)

Contents

Conversion

License

Model tree for warshanks/gemma-4-12B-it-assistant-coreai