Granite 4.1 (8B) – Q4NX for FastFlowLM on AMD Ryzen™ AI NPU (XDNA2 Only)
Model Summary
This is IBM's Granite 4.1 8B base model converted to Q4NX format for hardware-accelerated inference on AMD Ryzen™ AI processors with XDNA2 NPU, using the FastFlowLM engine.
Q4NX (Q4 NPU eXpress) is FastFlowLM's native packed quantization format — a rearranged Q4_1 layout tuned for the NPU's matrix engine tile sizes and memory access patterns. It is not a GGUF file.
Granite-specific scale factors
Granite 4.1 uses three architecture-level multipliers absent from standard Llama. These are baked directly into the Q4NX weights at conversion time so no runtime changes to FastFlowLM are required:
| Factor | Value | Applied to |
|---|---|---|
embedding_scale |
12.0 | embed_tokens (BF16, element-wise) |
residual_scale |
0.22 | o_proj and down_proj per layer (scales Q4_1 d/m blocks) |
logit_scale |
16.0 | lm_head (scales Q4_1 d/m blocks) |
attn_compensation |
128^(-0.25) ≈ 0.2973 |
q_proj and k_proj per layer — corrects 1/d vs 1/√d attention scale |
Requirements
- FastFlowLM ≥ 0.1.8 with
flm_version: "0.1.8"support - AMD Ryzen™ AI processor with XDNA2 (NPU2) — Strix Point (Ryzen AI 300 series) or later
- XRT runtime installed
Quick start
flm serve --hf Atomic-Germ/Granite-4.1-8B-NPU2
Files
| File | Description |
|---|---|
model.q4nx |
Q4NX quantized weights (safetensors container) |
config.json |
FastFlowLM model configuration |
tokenizer.json |
Tiktoken-based tokenizer (100352 vocab) |
tokenizer_config.json |
Chat template and special tokens |
*.xclbin |
Precompiled NPU kernels (attn, layer, lm_head, mm, dequant) |
Performance (Ryzen AI 9 HX 370)
| Metric | Value |
|---|---|
| Prefill | ~1000–1400 tok/s |
| Decode | ~40–44 tok/s |
| Context window | 131072 tokens |
License
The converted Q4NX weights are a derivative of ibm-granite/granite-4.1-8b-base and are distributed under the same Apache 2.0 license.
Citation
@misc{granite2024,
title={Granite 4.1: Open Foundation Models},
author={IBM Research},
year={2024},
url={https://huggingface.co/ibm-granite/granite-4.1-8b-base}
}
- Downloads last month
- 77
Model tree for Atomic-Germ/Granite-4.1-8B-Function-Calling-NPU2
Base model
ibm-granite/granite-4.1-8b-baseCollection including Atomic-Germ/Granite-4.1-8B-Function-Calling-NPU2
Collection
4 items • Updated