granite-4.1-8b — OpenVINO INT4 (channel-wise, symmetric, code-calibrated AWQ)

OpenVINO IR conversion of ibm-granite/granite-4.1-8b (dense 40-layer transformer, 128k context, Apache-2.0). Quantized INT4 symmetric channel-wise with AWQ + scale-estimation calibration.

The one difference from the sibling granite-4.1-8b-int4-cw-ov: the AWQ/scale-estimation calibration set is real Python source code instead of wikitext2 prose. The hypothesis is that domain-matched activation statistics protect the channels that matter for code generation and editing. On this repo's per-task-type benchmark it scored 20/26 (codegen 9/12) versus the wikitext2-calibrated sibling's 19/26 (codegen 8/12) — a small but real gain on codegen, and slightly faster, with the other task types unchanged.

Converted with optimum-intel (2.1.0.dev0+314b0c4) / NNCF 3.2.0 / OpenVINO 2026.3 / transformers 5.10.2, via the optimum-intel Python API:

from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig

# corpus = 128 random ~2.4k-char chunks of real Python source
qcfg = OVWeightQuantizationConfig(
    bits=4, sym=True, group_size=-1, ratio=1.0,
    dataset=corpus, awq=True, scale_estimation=True)
model = OVModelForCausalLM.from_pretrained(
    "ibm-granite/granite-4.1-8b", export=True,
    quantization_config=qcfg, compile=False)

granite-4.1-8b-int4-cw-ov — same recipe, wikitext2-calibrated (the sibling this variant is compared against).
granite-4.1-3b-int4-cw-ov — smaller/faster.

Usage (OpenVINO GenAI)

import openvino_genai as ov_genai

pipe = ov_genai.LLMPipeline("granite-4.1-8b-int4-cw-code-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Write a Python function that merges overlapping intervals.",
                    max_new_tokens=256))

Tested end-to-end as an OpenAI-compatible Continue.dev backend via core-ultra-llm-server.

Provenance

Base model: ibm-granite/granite-4.1-8b (IBM, Apache-2.0, released 2026-04)
Recipe: INT4 symmetric channel-wise (group_size -1, ratio 1.0), AWQ + scale estimation
Calibration: 128 real-Python code chunks (domain-matched), NNCF defaults otherwise
Conversion date: 2026-06-18
No finetuning — weights are a direct quantization of the original

Downloads last month: -

Model tree for HarmenWessels/granite-4.1-8b-int4-cw-code-ov

Base model

ibm-granite/granite-4.1-8b

Quantized

(47)

this model

HarmenWessels
/

granite-4.1-8b-int4-cw-code-ov

granite-4.1-8b — OpenVINO INT4 (channel-wise, symmetric, code-calibrated AWQ)

Related

Usage (OpenVINO GenAI)

Provenance

Model tree for HarmenWessels/granite-4.1-8b-int4-cw-code-ov