granite-4.1-3b - OpenVINO INT4 (channel-wise, symmetric, AWQ-calibrated on code)
OpenVINO IR conversion of ibm-granite/granite-4.1-3b (dense 40-layer transformer, 128k context, Apache-2.0), quantized to INT4 symmetric channel-wise with AWQ + scale-estimation, calibrated on real Python code instead of wikitext2 - to preserve the weight channels that matter for code editing and agent-executor workloads.
Converted with optimum-intel (transformers 4.57.6); the calibration corpus is 128 chunks of Python source (~2.4k chars each) rather than the usual text dataset:
# int4 cw-sym, AWQ + scale-estimation, code-calibrated
# (calibration corpus built by scripts/convert_code_calibrated.py)
optimum-cli export openvino -m ibm-granite/granite-4.1-3b \
--task text-generation-with-past --weight-format int4 --sym --group-size -1 \
--awq --scale-estimation --dataset <python-code-corpus> \
granite-4.1-3b-int4-cw-code-ov
Companion
The wikitext2-calibrated build is the general-purpose sibling; this build trades some general calibration for code fidelity.
Usage (OpenVINO GenAI)
import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline("granite-4.1-3b-int4-cw-code-ov", "GPU", CACHE_DIR="./.ovcache")
print(pipe.generate("Write a Python function that merges overlapping intervals.",
max_new_tokens=256))
Tested end-to-end as an OpenAI-compatible Continue.dev backend via core-ultra-llm-server.
Provenance
- Base model:
ibm-granite/granite-4.1-3b(IBM, Apache-2.0) - Recipe: INT4 symmetric channel-wise (group_size -1), AWQ + scale estimation
- Calibration: 128 chunks of real Python code (code-editing channels), NNCF defaults
- No finetuning - weights are a direct quantization of the original
- Downloads last month
- 21
Model tree for HarmenWessels/granite-4.1-3b-int4-cw-code-ov
Base model
ibm-granite/granite-4.1-3b