Instructions to use Matmultoken/Qwen3.5-4B-pouw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Matmultoken/Qwen3.5-4B-pouw with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Matmultoken/Qwen3.5-4B-pouw")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Matmultoken/Qwen3.5-4B-pouw")
model = AutoModelForMultimodalLM.from_pretrained("Matmultoken/Qwen3.5-4B-pouw")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Matmultoken/Qwen3.5-4B-pouw with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Matmultoken/Qwen3.5-4B-pouw"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Matmultoken/Qwen3.5-4B-pouw",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Matmultoken/Qwen3.5-4B-pouw

SGLang

How to use Matmultoken/Qwen3.5-4B-pouw with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Matmultoken/Qwen3.5-4B-pouw" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Matmultoken/Qwen3.5-4B-pouw",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Matmultoken/Qwen3.5-4B-pouw" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Matmultoken/Qwen3.5-4B-pouw",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Matmultoken/Qwen3.5-4B-pouw with Docker Model Runner:
```
docker model run hf.co/Matmultoken/Qwen3.5-4B-pouw
```

Qwen3.5-4B-pouw

A self-contained pouw model, based on Qwen/Qwen3.5-4B. It bundles the full base weights (apache-2.0) together with the metadata that makes it mine MatMulToken Proof-of-Useful-Work while it serves — pull this one repo and it runs, no second download.

MatMulToken's mining is output-preserving: generation is bit-identical to the base model. The eligible transformer matmuls (in_features == common_dim = 2560) are reused as PoW lottery tickets — you serve real text and mine on the same compute, no second matmul.

It is GPU-agnostic (portable Triton/PyTorch kernels, no CUDA build): RTX 3090 (sm86) → 5090 → H100 → B200, same code.

Mining shape

field	value
base model	`Qwen/Qwen3.5-4B`
modality	text
common_dim	2560
rank	32
mine_layers	16 (overhead dial; layer count)
pipeline	vllm

Mining regime (LLM)

Text LLMs mine during prefill — when many tokens are processed at once (rows = tokens is large). Single-token decode does not mine (rows ≈ 1), so interactive chat mines far less than long-prompt or batched-prefill serving. Diffusion models mine on every forward (large token count always), so for continuous mining a diffusion model (see Matmultoken/Z-Image-Turbo-pouw) is the stronger substrate; this LLM repo is for prefill-heavy / batch workloads.

Use

# Serve via vLLM with quantization="pouw" (vLLM-MatMulToken plugin auto-registers it).
from vllm import LLM
llm = LLM(model="Matmultoken/Qwen3.5-4B-pouw", quantization="pouw")  # mines on eligible matmuls while it serves
print(llm.generate("The history of money is"))    # generation is bit-identical to the base model

Notes

The live PoW job + difficulty target always come from the chain at runtime — never baked into this repo. GPU kernels compile per-arch on first run (one-time, cached on disk).
Published under the Matmultoken organization. The base weights (apache-2.0) are bundled in this repo at a pinned snapshot for a reproducible mining shape; the original model's LICENSE and attribution are preserved in-repo.

Generated by MatMulToken publish_pouw_models.py. License: MIT.

Downloads last month: 49

Safetensors

Model size

5B params

Tensor type

BF16

F32

Model tree for Matmultoken/Qwen3.5-4B-pouw

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Quantized

(240)

this model