granite-4.1-8b β OpenVINO INT4 (channel-wise, symmetric, code-calibrated AWQ)
OpenVINO IR conversion of ibm-granite/granite-4.1-8b (dense 40-layer transformer, 128k context, Apache-2.0). Quantized INT4 symmetric channel-wise with AWQ + scale-estimation calibration.
The one difference from the sibling granite-4.1-8b-int4-cw-ov: the AWQ/scale-estimation calibration set is real Python source code instead of wikitext2 prose. The hypothesis is that domain-matched activation statistics protect the channels that matter for code generation and editing. On this repo's per-task-type benchmark it scored 20/26 (codegen 9/12) versus the wikitext2-calibrated sibling's 19/26 (codegen 8/12) β a small but real gain on codegen, and slightly faster, with the other task types unchanged.
Converted with optimum-intel (2.1.0.dev0+314b0c4) / NNCF 3.2.0 / OpenVINO 2026.3 / transformers 5.10.2, via the optimum-intel Python API:
from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig
# corpus = 128 random ~2.4k-char chunks of real Python source
qcfg = OVWeightQuantizationConfig(
bits=4, sym=True, group_size=-1, ratio=1.0,
dataset=corpus, awq=True, scale_estimation=True)
model = OVModelForCausalLM.from_pretrained(
"ibm-granite/granite-4.1-8b", export=True,
quantization_config=qcfg, compile=False)
Related
- granite-4.1-8b-int4-cw-ov β same recipe, wikitext2-calibrated (the sibling this variant is compared against).
- granite-4.1-3b-int4-cw-ov β smaller/faster.
Usage (OpenVINO GenAI)
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline("granite-4.1-8b-int4-cw-code-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Write a Python function that merges overlapping intervals.",
max_new_tokens=256))
Tested end-to-end as an OpenAI-compatible Continue.dev backend via core-ultra-llm-server.
Provenance
- Base model:
ibm-granite/granite-4.1-8b(IBM, Apache-2.0, released 2026-04) - Recipe: INT4 symmetric channel-wise (group_size -1, ratio 1.0), AWQ + scale estimation
- Calibration: 128 real-Python code chunks (domain-matched), NNCF defaults otherwise
- Conversion date: 2026-06-18
- No finetuning β weights are a direct quantization of the original
- Downloads last month
- -
Model tree for HarmenWessels/granite-4.1-8b-int4-cw-code-ov
Base model
ibm-granite/granite-4.1-8b