granite-4.1-3b - OpenVINO INT4 (channel-wise, symmetric, AWQ-calibrated on code)

OpenVINO IR conversion of ibm-granite/granite-4.1-3b (dense 40-layer transformer, 128k context, Apache-2.0), quantized to INT4 symmetric channel-wise with AWQ + scale-estimation, calibrated on real Python code instead of wikitext2 - to preserve the weight channels that matter for code editing and agent-executor workloads.

Converted with optimum-intel (transformers 4.57.6); the calibration corpus is 128 chunks of Python source (~2.4k chars each) rather than the usual text dataset:

# int4 cw-sym, AWQ + scale-estimation, code-calibrated
# (calibration corpus built by scripts/convert_code_calibrated.py)
optimum-cli export openvino -m ibm-granite/granite-4.1-3b \
  --task text-generation-with-past --weight-format int4 --sym --group-size -1 \
  --awq --scale-estimation --dataset <python-code-corpus> \
  granite-4.1-3b-int4-cw-code-ov

Companion

The wikitext2-calibrated build is the general-purpose sibling; this build trades some general calibration for code fidelity.

Usage (OpenVINO GenAI)

import openvino_genai as ov_genai

pipe = ov_genai.LLMPipeline("granite-4.1-3b-int4-cw-code-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Write a Python function that merges overlapping intervals.",
                    max_new_tokens=256))

Tested end-to-end as an OpenAI-compatible Continue.dev backend via core-ultra-llm-server.

Provenance

  • Base model: ibm-granite/granite-4.1-3b (IBM, Apache-2.0)
  • Recipe: INT4 symmetric channel-wise (group_size -1), AWQ + scale estimation
  • Calibration: 128 chunks of real Python code (code-editing channels), NNCF defaults
  • No finetuning - weights are a direct quantization of the original
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HarmenWessels/granite-4.1-3b-int4-cw-code-ov

Quantized
(35)
this model