Dream-Coder-7B OpenVINO INT8

This is the OpenVINO IR version of the Dream-org/Dream-Coder-v0-Instruct-7B model, optimized for Intel GPUs and CPUs. The model weights have been compressed to INT8 using NNCF for improved inference performance and reduced memory footprint on integrated graphics.

Dream-Coder is a state-of-the-art discrete diffusion language model that supports any-order code generation and excels at architectural planning.

Usage

This model requires custom architecture files. When loading, you must use trust_remote_code=True.

Manual Inference (Python)

import openvino as ov
from transformers import AutoTokenizer, AutoConfig

model_path = "your_hf_username/Dream-Coder-7B-ov-int8"

core = ov.Core()
ov_model = core.read_model(f"{model_path}/model.xml")
model = core.compile_model(ov_model, "GPU") # or "CPU"

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)

# Note: Execution requires a discrete diffusion sampling loop.
# See the repository's diffusion_server.py for the full loop implementation with LocalLeap support.

Optimization Details

  • Quantization: NNCF Weight-Only Quantization (INT8_ASYM)
  • Target Hardware: Intel integrated GPUs (e.g., Gen9.5 UHD 620) and CPUs.
  • Patches Applied: Includes fixes for standard OpenVINO conversion paths to bypass aten::cat axis issues in the custom attention block.

Repository

For the complete server implementation, including the discrete diffusion sampling loop and optimizations like LocalLeap designed for Intel integrated graphics, please visit the main project repository: https://github.com/naranor/openvino-gpu-llm-server

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naranor/Dream-Coder-7B-ov-int8

Finetuned
(2)
this model