Qwen3.5-4B-pouw

A self-contained pouw model, based on Qwen/Qwen3.5-4B. It bundles the full base weights (apache-2.0) together with the metadata that makes it mine MatMulToken Proof-of-Useful-Work while it serves — pull this one repo and it runs, no second download.

MatMulToken's mining is output-preserving: generation is bit-identical to the base model. The eligible transformer matmuls (in_features == common_dim = 2560) are reused as PoW lottery tickets — you serve real text and mine on the same compute, no second matmul.

It is GPU-agnostic (portable Triton/PyTorch kernels, no CUDA build): RTX 3090 (sm86) → 5090 → H100 → B200, same code.

Mining shape

field value
base model Qwen/Qwen3.5-4B
modality text
common_dim 2560
rank 32
mine_layers 16 (overhead dial; layer count)
pipeline vllm

Mining regime (LLM)

Text LLMs mine during prefill — when many tokens are processed at once (rows = tokens is large). Single-token decode does not mine (rows ≈ 1), so interactive chat mines far less than long-prompt or batched-prefill serving. Diffusion models mine on every forward (large token count always), so for continuous mining a diffusion model (see Matmultoken/Z-Image-Turbo-pouw) is the stronger substrate; this LLM repo is for prefill-heavy / batch workloads.

Use

# Serve via vLLM with quantization="pouw" (vLLM-MatMulToken plugin auto-registers it).
from vllm import LLM
llm = LLM(model="Matmultoken/Qwen3.5-4B-pouw", quantization="pouw")  # mines on eligible matmuls while it serves
print(llm.generate("The history of money is"))    # generation is bit-identical to the base model

Notes

  • The live PoW job + difficulty target always come from the chain at runtime — never baked into this repo. GPU kernels compile per-arch on first run (one-time, cached on disk).
  • Published under the Matmultoken organization. The base weights (apache-2.0) are bundled in this repo at a pinned snapshot for a reproducible mining shape; the original model's LICENSE and attribution are preserved in-repo.

Generated by MatMulToken publish_pouw_models.py. License: MIT.

Downloads last month
49
Safetensors
Model size
5B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Matmultoken/Qwen3.5-4B-pouw

Finetuned
Qwen/Qwen3.5-4B
Quantized
(240)
this model