Instructions to use omnipearl/Qwen3.5-4B-pouw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use omnipearl/Qwen3.5-4B-pouw with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("omnipearl/Qwen3.5-4B-pouw", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Qwen3.5-4B-pouw
A pouw shaping repo for omnipearl/Qwen3.5-4B. It contains no
weights β only the metadata that makes omnipearl/Qwen3.5-4B mine OmniPearl Proof-of-Useful-Work while it
serves. The base weights are pulled from the base repo on load.
OmniPearl's mining is output-preserving: generation is bit-identical to the base model. The
eligible transformer matmuls (in_features == common_dim = 2560) are reused as PoW
lottery tickets β you serve real text and mine on the same compute, no second matmul.
It is GPU-agnostic (portable Triton/PyTorch kernels, no CUDA build): RTX 3090 (sm86) β 5090 β H100 β B200, same code.
Mining shape
| field | value |
|---|---|
| base model | omnipearl/Qwen3.5-4B |
| modality | text |
| common_dim | 2560 |
| rank | 32 |
| mine_layers | 16 (overhead dial; layer count) |
| pipeline | vllm |
Mining regime (LLM)
Text LLMs mine during prefill β when many tokens are processed at once (rows = tokens is large). Single-token decode does not mine (rows β 1), so interactive chat mines far less than long-prompt or batched-prefill serving. Diffusion models mine on every forward (large token count always), so for continuous mining a diffusion model (see omnipearl/Z-Image-Turbo-pouw) is the stronger substrate; this LLM repo is for prefill-heavy / batch workloads.
Use
# Serve via vLLM with quantization="pouw" (vLLM-Omni plugin auto-registers it).
from vllm import LLM
llm = LLM(model="omnipearl/Qwen3.5-4B", quantization="pouw") # mines on eligible matmuls while it serves
print(llm.generate("The history of money is")) # generation is bit-identical to the base model
Notes
- The live PoW job + difficulty target always come from the chain at runtime β never baked into this repo. GPU kernels compile per-arch on first run (one-time, cached on disk).
- Published under the
omnipearlorganization. Base weights are the apache-2.0 mirror atomnipearl/Qwen3.5-4B; original model attribution is preserved there.
Generated by OmniPearl publish_pouw_models.py. License: MIT.
- Downloads last month
- 17