granite-4.1-8b β€” OpenVINO INT4 (channel-wise, symmetric, code-calibrated AWQ)

OpenVINO IR conversion of ibm-granite/granite-4.1-8b (dense 40-layer transformer, 128k context, Apache-2.0). Quantized INT4 symmetric channel-wise with AWQ + scale-estimation calibration.

The one difference from the sibling granite-4.1-8b-int4-cw-ov: the AWQ/scale-estimation calibration set is real Python source code instead of wikitext2 prose. The hypothesis is that domain-matched activation statistics protect the channels that matter for code generation and editing. On this repo's per-task-type benchmark it scored 20/26 (codegen 9/12) versus the wikitext2-calibrated sibling's 19/26 (codegen 8/12) β€” a small but real gain on codegen, and slightly faster, with the other task types unchanged.

Converted with optimum-intel (2.1.0.dev0+314b0c4) / NNCF 3.2.0 / OpenVINO 2026.3 / transformers 5.10.2, via the optimum-intel Python API:

from optimum.intel import OVModelForCausalLM, OVWeightQuantizationConfig

# corpus = 128 random ~2.4k-char chunks of real Python source
qcfg = OVWeightQuantizationConfig(
    bits=4, sym=True, group_size=-1, ratio=1.0,
    dataset=corpus, awq=True, scale_estimation=True)
model = OVModelForCausalLM.from_pretrained(
    "ibm-granite/granite-4.1-8b", export=True,
    quantization_config=qcfg, compile=False)

Related

Usage (OpenVINO GenAI)

import openvino_genai as ov_genai

pipe = ov_genai.LLMPipeline("granite-4.1-8b-int4-cw-code-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Write a Python function that merges overlapping intervals.",
                    max_new_tokens=256))

Tested end-to-end as an OpenAI-compatible Continue.dev backend via core-ultra-llm-server.

Provenance

  • Base model: ibm-granite/granite-4.1-8b (IBM, Apache-2.0, released 2026-04)
  • Recipe: INT4 symmetric channel-wise (group_size -1, ratio 1.0), AWQ + scale estimation
  • Calibration: 128 real-Python code chunks (domain-matched), NNCF defaults otherwise
  • Conversion date: 2026-06-18
  • No finetuning β€” weights are a direct quantization of the original
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for HarmenWessels/granite-4.1-8b-int4-cw-code-ov

Quantized
(47)
this model